bitranox / igittigitt

A spec-compliant gitignore parser for Python
MIT License
24 stars 6 forks source link

Handle symlinks correctly #18

Closed frthjf closed 2 years ago

frthjf commented 3 years ago

This is a follow up to issue #16. The glob issue is now resolved, however, there are still problems that I encountered.

  1. Using pathlib.Path().resolve() follows symlinks

The match() method resolve()s the target before matching it against the ignore rules. That means that you may not be able to match against a file under a symlinked directory. Consider the following example:

prey/
   if-you-can.txt
catch_me/ -> prey/
.gitignore

where catch_me is a symlink to prey. The gitignore contains the following:

catch_me

Now, match('catch_me') will fail as it is resolved to prey first before matching against the rule happens. A solution would be to avoid the resolving, for example:

       def match(self, file_path) -> bool:
                str_file_path = os.path.abspath(file_path)
                is_file = os.path.isfile(str_file_path)
                match = self._match_rules(str_file_path, is_file)
                if match:
                    match = self._match_negation_rules(str_file_path)
                return match
  1. Add FOLLOW flag

In version 3 and later, globmatch does not follow symlinks unless the FOLLOW flag is set, e.g.

if wcmatch.glob.globmatch(
                            str_file_path,
                            [self.last_matching_rule.pattern_glob],
                            flags=wcmatch.glob.DOTGLOB
                            | wcmatch.glob.GLOBSTAR
                            | wcmatch.glob.FOLLOW,

I believe the follow flag should be included.

bitranox commented 3 years ago

ok - time to write some tests ;-) I also believe that at the moment .gitignore files in the symlinked folder ( or symlinked .gitignore files) would not be handeled correctly. So a full blown test matrix is needed ...

frthjf commented 3 years ago

Thanks, that's very kind that you are willing to look into this. Let me know if you need any help with this.

ITProKyle commented 3 years ago

I am encountering this issue as well but have a slightly different use case to the one described.

For my Python projects, I use poetry to manage virtual environments. I have virtualenvs.in-project=true set in my poetry.toml file. This results in the Python virtual environment being created in $PROJECT_DIR/.venv. In my $PROJECT_DIR/.gitignore file, I have a line containing .venv which causes git to ignore the contents of the directory.

When adding rules to igittigitt.IgnoreParser via .parse_rule_files(), I am passing $PROJECT_DIR as my base directory. Using pathlib.path.rglob("*"), I am recursively iterating over the context of my project directory and all it's subdirectories - passing each yielded object into .match() to ensure that it does not match before returning the object.

The results of this looks as I would expect except it contains 3 files within the $PROJECT_DIR/.venv directory: $PROJECT_DIR/.venv/bin/python3.9, $PROJECT_DIR/.venv/bin/python3, and $PROJECT_DIR/.venv/bin/python. All three of these files are symlinks to the Python executables on my system. They are created by virtualenv when the Python virtual environment is created so the existence of these files is unavoidable.

My current workaround is to add an additional rule (.add_rule("**/bin/python*", "/")). This is fine on my system while the integration of this library is under development but it's not something I want to rely on in the final release.

If no one else has time to look into fixing this, I'll circle back to see what I can work out.

bitranox commented 3 years ago

ok, I will look into it - but will take some time....