BurntSushi / ripgrep

ripgrep recursively searches directories for a regex pattern while respecting your gitignore
The Unlicense
48.25k stars 1.98k forks source link

.gitignore rule is matched incorrectly while in a subdir #2778

Open woess opened 6 months ago

woess commented 6 months ago

Please tick this box to confirm you have reviewed the above.

What version of ripgrep are you using?

ripgrep 14.1.0

How did you install ripgrep?

cargo install

What operating system are you using ripgrep on?

Linux

Describe your bug.

.gitignore rule to ignore /dir/*.ext works correctly when running rg from the repo root, but incorrectly ignores *.ext in all subdirs when running from dir.

What are the steps to reproduce the behavior? / What is the actual behavior?

mkdir /tmp/repro
cd /tmp/repro
git init
mkdir parent
mkdir parent/subdir
echo "/parent/*.txt" > .gitignore
echo "please ignore me" > parent/ignore-me.txt
echo "please don't ignore me" > parent/subdir/dont-ignore-me.txt

while in git repo root dir (/tmp/repro), everything is working as expected:

$ rg ignore
parent/subdir/dont-ignore-me.txt
1:please don't ignore me

but once you cd into ./parent, suddenly the rule is unexpectedly applied to the file in subdir, too, and nothing is found:

$ cd parent
$ rg ignore
rg: No files were searched, which means ripgrep probably applied a filter you didn't expect.
Running with --debug will show why files are being skipped.

Debug output (argument parsing omitted: "no extra arguments found from configuration file", "heuristic chose to search ./"):

/tmp/repro $ rg ignore
rg: DEBUG|grep_regex::config|/…/grep-regex-0.1.12/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|/…/globset-0.4.14/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|/…/globset-0.4.14/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 1 required extensions, 0 regexes
rg: DEBUG|ignore::walk|/…/ignore-0.4.22/src/walk.rs:1799: ignoring ./.git: Ignore(IgnoreMatch(Hidden))
rg: DEBUG|ignore::walk|/…/ignore-0.4.22/src/walk.rs:1799: ignoring ./.gitignore: Ignore(IgnoreMatch(Hidden))
rg: DEBUG|ignore::walk|/…/ignore-0.4.22/src/walk.rs:1799: ignoring ./parent/ignore-me.txt: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("./.gitignore"), original: "/parent/*.txt", actual: "parent/*.txt", is_whitelist: false, is_only_dir: false })))
test/subdir/ignore-me.txt
1:please don't ignore me

/tmp/repro/parent $ rg ignore
rg: DEBUG|grep_regex::config|/…/grep-regex-0.1.12/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|/…/globset-0.4.14/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|/…/globset-0.4.14/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 1 required extensions, 0 regexes
rg: DEBUG|ignore::walk|/…/ignore-0.4.22/src/walk.rs:1799: ignoring ./ignore-me.txt: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("/tmp/repro/.gitignore"), original: "/parent/*.txt", actual: "parent/*.txt", is_whitelist: false, is_only_dir: false })))
rg: DEBUG|ignore::walk|/…/ignore-0.4.22/src/walk.rs:1799: ignoring ./subdir/dont-ignore-me.txt: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("/tmp/repro/.gitignore"), original: "/parent/*.txt", actual: "parent/*.txt", is_whitelist: false, is_only_dir: false })))
rg: No files were searched, which means ripgrep probably applied a filter you didn't expect.
Running with --debug will show why files are being skipped.

What is the expected behavior?

/tmp/repro/parent $ rg ignore
subdir/dont-ignore-me.txt
1:please don't ignore me
h4emp3 commented 3 months ago

I just was pretty confused about the ignore behaviour of ripgrep and I think I observed the same problem described here.


Minimal repro I could come up with:

mkdir .git
echo '/folder/file' > .gitignore
mkdir -p folder/sub/folder/sub
echo 'do NOT find me' > folder/file
echo 'find me' > folder/sub/file
# rg --version
ripgrep 14.1.0

features:-simd-accel,+pcre2
simd(compile):+SSE2,-SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2

PCRE2 10.42 is available (JIT is available)
# rg --debug 'find me' .
rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:97: no extra arguments found from configuration file
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1260: found hostname for hyperlink configuration: vermeer
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1270: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|grep_regex::config|/usr/share/cargo/registry/grep-regex-0.1.12/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|/usr/share/carg
---o/registry/globset-0.4.14/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|/usr/share/cargo/registry/globset-0.4.14/src/lib.rs:453: built glob set; 1 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|ignore::walk|/usr/share/cargo/registry/ignore-0.4.22/src/walk.rs:1799: ignoring ./.git: Ignore(IgnoreMatch(Hidden))
rg: DEBUG|ignore::walk|/usr/share/cargo/registry/ignore-0.4.22/src/walk.rs:1799: ignoring ./.gitignore: Ignore(IgnoreMatch(Hidden))
rg: DEBUG|ignore::walk|/usr/share/cargo/registry/ignore-0.4.22/src/walk.rs:1799: ignoring ./folder/file: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("./.gitignore"), original: "/folder/file", actual: "folder/file", is_whitelist: false, is_only_dir: false })))
./folder/sub/file
1:find me
# rg --debug 'find me' folder
rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:97: no extra arguments found from configuration file
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1260: found hostname for hyperlink configuration: vermeer
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1270: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|grep_regex::config|/usr/share/cargo/registry/grep-regex-0.1.12/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|/usr/share/cargo/registry/globset-0.4.14/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|/usr/share/cargo/registry/globset-0.4.14/src/lib.rs:453: built glob set; 1 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|ignore::walk|/usr/share/cargo/registry/ignore-0.4.22/src/walk.rs:1799: ignoring folder/file: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("/tmp/rg-glob-test/.gitignore"), original: "/folder/file", actual: "folder/file", is_whitelist: false, is_only_dir: false })))
rg: DEBUG|ignore::walk|/usr/share/cargo/registry/ignore-0.4.22/src/walk.rs:1799: ignoring folder/sub/file: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("/tmp/rg-glob-test/.gitignore"), original: "/folder/file", actual: "folder/file", is_whitelist: false, is_only_dir: false })))

I expected the second call to rg to also find the file in the subdirectory.

Perhaps the most notable difference to the original issue description is: You don't actually need to cd into the subdirectory to trigger the error, it is enough to pass the folder as argument to rg.


Thanks in advance, I really appreciate your work on rg and your comments and writeups on it and other topics!

ttrei commented 1 month ago

I encountered a similar problem with ignore behavior.

mkdir repro
cd repro
mkdir -p a/b/c
echo "foo" > a/b/c/foo.txt
echo "**/b/c" > .rgignore
# rg --version
ripgrep 14.1.0

features:-simd-accel,+pcre2
simd(compile):+SSE2,-SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2

PCRE2 10.42 is available (JIT is available)
# rg foo ./ --debug
rg: DEBUG|rg::flags::config|crates/core/flags/config.rs:41: /home/reinis/.ripgreprc: arguments loaded from config file: ["--smart-case", "--follow"]
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1260: found hostname for hyperlink configuration: mercury
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1264: found wsl_prefix for hyperlink configuration: wsl$/debian-main
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1270: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 5 basenames, 1 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 5 basenames, 1 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 5 basenames, 1 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 1 literals, 0 basenames, 0 extensions, 0 prefixes, 1 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|ignore::walk|crates/ignore/src/walk.rs:1799: ignoring ./.rgignore: Ignore(IgnoreMatch(Hidden))
rg: DEBUG|ignore::walk|crates/ignore/src/walk.rs:1799: ignoring ./a/b/c: Ignore(IgnoreMatch(Gitignore(Glob { from: Some("./.rgignore"), original: "**/b/c", actual: "**/b/c", is_whitelist: false, is_only_dir: false })))
# rg foo ./a --debug
rg: DEBUG|rg::flags::config|crates/core/flags/config.rs:41: /home/reinis/.ripgreprc: arguments loaded from config file: ["--smart-case", "--follow"]
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1260: found hostname for hyperlink configuration: mercury
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1264: found wsl_prefix for hyperlink configuration: wsl$/debian-main
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1270: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 5 basenames, 1 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 5 basenames, 1 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 5 basenames, 1 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 1 literals, 0 basenames, 0 extensions, 0 prefixes, 1 suffixes, 0 required extensions, 0 regexes
./a/b/c/foo.txt
1:foo

I expect the second rg invocation to also ignore the b/c subdirectory.