Open ddickstein opened 7 years ago
Its any valid regex pattern.
Wasn't working for me. And it's not clear from just PATTERN if it's referring to a regex or a shell pattern, which are different.
I have long had trouble with ignore patterns myself — they never seem to behave quite the way i expect. So i looked into it, and i've found that the behaviour is actually quite complex. I'm not an expert in C but i think this is mostly correct:
Ignore patterns are never applied to file names provided directly on the command line — e.g., if you do ag x foo.c
(where foo.c
is a regular file), foo.c
will never be ignored, regardless of any other options you might supply.
Ignore patterns are never applied if you use -u
without -p
. I feel like this is a bug. You can work around it like this: ag -up /dev/null --ignore y x foo/
Contrary to what canvural said, under no circumstances are ignore patterns ever treated as regex(7)
- or PCRE-style regular expressions.
fnmatch
patternsIf the ignore pattern contains one of a handful of meta-characters (including !
, *
, and ?
), it's treated as an fnmatch
pattern (i'm calling them that because they pass the is_fnmatch()
check). There are several different variations on this type of pattern, which are handled as follows:
If the pattern begins with *.
and the following characters contain a .
and don't contain any other meta-characters, it's treated as a file-extension pattern. I think there are a two issues here:
The requirement to contain a second .
doesn't make sense to me. Why should *.min.js
be treated as a file extension, but *.js
not? Is the logic inverted here?
Even then, the extension-matching behaviour is surprising — it treats everything after the first dot in a file name as the extension. In other words, in the file name foo.bar.min.js
, the file extension is bar.min.js
. This wouldn't be an issue if the ignored extension was matched against the right side of the file extension, but it's actually just a strcmp()
. So if i tell ag
to ignore *.min.js
, it will not in fact ignore foo.bar.min.js
, since bar.min.js
doesn't equal min.js
.
It's worth noting also that file-extension patterns are matched against directories. I think this is deliberate, but it does seem a little surprising for a special-case file-extension-matching feature.
Anyway, i assume that these patterns are special-cased for performance reasons — ignoring file extensions is a very common use case, and it's faster to do a strcmp()
against a set of fixed strings than to perform actual pattern-matching on each file. IMO, though, the fact that it's a very common use case means that this particular ignore functionality, above all others, should Just Work.
If the pattern begins with a /
, it's treated as a slash-regex pattern.
Slash-regex patterns are anchored to the beginning of the search path supplied on the command line. So, for example, if you run ag --ignore '/foo*' x foobar/baz/ baz/foobar/
, ag
will ignore every file under foobar/baz
, but not any files under baz/foobar
.
After some minor normalisation (stripping the leading slash, &c.), the slash-regex pattern is passed directly to fnmatch()
to be compared against the complete file path (relative to and including the search path).
If the pattern begins with a !
, it's treated as an invert-regex pattern.
Invert-regex patterns are used to 'white-list' files that would otherwise match a standard regex pattern (described below). They can NOT be combined with the slash-regex behaviour, and ignores from slash-regex patterns take precedence over 'un-ignores' from invert-regex patterns. For example:
# Slash-regex pattern '/foo*' wins, will ignore all files
% ag --ignore '/foo*' --ignore '!foo*' x foobar/baz
# Invert-regex pattern '!foo*' wins, no files will be ignored
% ag --ignore 'foo*' --ignore '!foo*' x foobar/baz
Invert-regex patterns are matched against each individual segment of the path. As a result, any invert-regex pattern containing a /
is effectively dropped. In other words, you can't use !a*/d*
to match a path like /foo/bar/abc/def
.
And again, despite the name, these are not actually regex patterns, they're passed to fnmatch()
.
Any other non-literal pattern is a standard regex pattern.
Like invert-regex patterns, these are matched against each individual segment of the path, and again they are not actually regex patterns, they're passed to fnmatch()
.
Patterns that don't contain meta-characters are treated as static patterns. There are two variations on these:
Like slash-regex patterns, slash-static patterns are prefixed with a /
and are anchored to the beginning of the search path provided on the command line. So ag --ignore '/foo/bar' x foo/bar/ baz/foo/bar/
would ignore all of the files under foo/bar
but not any under baz/foo/bar
.
Anything else is treated as a standard or name-static pattern. Like regex patterns, these are matched against each individual segment of the path (so bar
will match the file foo/bar/baz.c
).
Unlike every other kind of pattern, name-static patterns are also matched across path segments — so for example b/c
will match a/b/c/d/foo.c
(but not a/b/ccc/d/foo.c
).
I hope it's not mean of me to say so but i would call this design sub-optimal. There are so many special cases and exceptions, it's quite difficult to even document the behaviour (as you can see), let alone expect users to remember it.
Given that ag
at least nominally supports reading from VCS ignore files, i think the expectation that most users would have is that patterns supplied to --ignore
are treated identically to patterns found in .ignore
or .gitignore
, and those patterns are treated more or less the same way git treats them (described here).
I understand that there are performance considerations and that git has some special features that would have to be re-implemented (like its handling of **
and trailing /
), but that's my gut reaction anyway.
Oh interesting, I didn't realize this flag existed when I filed #1138. But given the apparently strange behavior of this flag, maybe --invert-file-search-regex
is exactly what's desired here? As implemented in #1150, it's exactly the inverse of the -G
/--file-search-regex
flag.
With this documentation in mind, I still don't see how to ignore all .so
files, including versioned.
/usr/lib64/firefox/libmozsqlite3.so
/usr/lib64/vtk/libvtksqlite.so.1
/usr/lib64/libgdal.so.20.4.2
With this documentation in mind, I still don't see how to ignore all
.so
files, including versioned./usr/lib64/firefox/libmozsqlite3.so /usr/lib64/vtk/libvtksqlite.so.1 /usr/lib64/libgdal.so.20.4.2
ag -gl --ignore '*.so*'
Please provide documentation for how to use
--ignore
. It just says "PATTERN" but gives no information for what kind of pattern it's looking for or any example usages. A quick google search reveals dozens of different formats for the pattern, most of which do not work. My goal was to follow symlinks but exclude a number of directories, and after trying at least 10 different iterations, none of them seemed to work.