Doriandarko / RepoToTextForLLMs

Automate the analysis of GitHub repositories for LLMs with RepoToTextForLLMs. Fetch READMEs, structure, and non-binary files efficiently. Outputs include analysis prompts to aid in comprehensive repo evaluation
627 stars 79 forks source link

Missing binary file detection #1

Closed marcomarinodev closed 6 months ago

marcomarinodev commented 6 months ago

Hi there, I was using it for a c++ project and in the repo there was a binary output file that was not declared in the .gitignore. It would be interesting to have a check to ignore binary files.

The solution would be simple, something like this:

def is_binary_file_with_mimetype(file_path):
    mime_type, _ = mimetypes.guess_type(file_path)
    if mime_type is not None
        return mime_type.startswith('text/') == False
    return False

I could open a PR if you're interested.

Thank you.

Doriandarko commented 6 months ago

On default it should already avoid any type of binary file, what was the file type?

marcomarinodev commented 6 months ago

The file has no extension, it's just an out file.

https://github.com/marcomarinodev/parallel-huffman/blob/main/output