Closed epa closed 7 years ago
ack can't be learning about regex syntax of what it's searching. -w
tweaks are coming in ack 3.
Without wanting to go too crazy, feature requests like this might be handed by defining regexp transformations in .ackrc. So it might contain
custom-regexp-tweak-Foo: (?:\b|\[a-z])$RE\b
Then --Foo would be recognized as a new option wrapping the regexp ($RE) as specified.
You might even let it be conditionalized on file type -- useful for matching string literals, for example, which have varying syntax in different languages.
This would let people muck around with weird and wonderful matching rules without having to change the ack core. Just a suggestion, I don't know whether you will like it.
Thanks. Don't like it. There's just too much customization going on here.
Create a file with contents
You will recognize that this is Perl regexp syntax. The first line matches
HELLO
anywhere in the string. The second matchesGOODBYE
but with word boundaries either side, that is, the whole word only. This is a common idiom for writing regular expressions to match a whole word.The problem comes when using ack to search the codebase.
ack -w GOODBYE
will not find it. As far as ack is concerned the whole word mentioned in the above file isbGOODBYE
.Now, I am not saying that ack should solve the halting problem and check all possible cases where some programming language quotes a whole word or pastes it together in some wacky way. But this regexp syntax is common not just to Perl but to many other programming languages that support regular expressions. It would be useful to make ack at least a little bit aware of it, so it can spot whole words even when inside regular expressions. (Another case is
/\AGOODBYE\z/
to match the whole string.)My proposal, then, is to tweak ack's
-w
flag so that at the front of the word it expects either a word boundary or\x
where x is some alphanumeric character. This would make it return strictly more matches than before. Of course, there would be some false positives, particularly for Windows paths (whereC:\temp
would now match the wordemp
) and for TeX documents (\box
would matchox
).If that is too much, perhaps ack could become a little more programming-language-aware and turn on this enhanced whole-word check only when it is reading a Perl source file? Or a new
-W
flag would be used for the fancy whole-word checking, with-w
keeping its existing semantics (which have proved troublesome enough already; see https://github.com/petdance/ack2/issues/445).