Open kgilpin opened 4 weeks ago
Title: Integrate Git Attributes for Binary File Classification in Context Search
Problem: The system currently utilizes file extensions to identify binary files during context searches and lookups. However, this method can be imprecise as file extensions might not provide an accurate classification of binary files. Git provides a mechanism to identify binary files through git attributes, improving the accuracy of file classification. The task is to incorporate git attributes alongside file extensions in the context search process to leverage Git’s binary file classification.
Analysis:
To accurately identify binary files, the system should combine the existing method of checking file extensions with Git's metadata. Git attributes can be configured to define whether specific files are treated as binary, which can override the default behavior based on file extensions. The system can practically utilize the git check-attr
command to query git attributes applied to files and determine if they are declared as binary. Incorporating this into the current mechanism will help filter out files more effectively during the context search and lookup process.
Proposed Changes:
File: packages/cli/src/fulltext/FileIndex.ts
filterFiles
function to include logic that also checks git attributes to determine if a file should be considered binary.git check-attr
command within the try-catch block of the function where file filtering is performed.result
, regardless of its extension.File: packages/cli/src/fulltext/listGitProjectFiles.ts
git check-attr
, ensuring the returned file list is annotated with their git-determined binary status if applicable.File: packages/scanner/src/lastGitOrFSModifiedDate.ts
git check-attr
and interpreting the result.Integration Points:
Testing:
packages/cli/tests/unit/fulltext/listGitProjectFiles.spec.ts
, to verify that the binary file identification process now accurately incorporates git attributes.By implementing these changes, we enhance the precision of context searches and file indexing by intelligently considering Git's mechanisms for binary file classification.
Git metadata can indicate which files are binary. Use git attributes information as well as file extension in the context search / lookup / collector to determine which files to treat as binary.