Syntax-aware grep? - Githubissues

radioneko commented 9 years ago

It would be nice to have an option to exclude comments and/or string literals from being grepped.

ggreer commented 9 years ago

Sorry, but that's pretty hard to build. Correctly doing this would require language detection and actual parsing of each language. It sounds like exuberant ctags is the tool you want.

ELLIOTTCABLE commented 8 years ago

So, I'd like to +1 on this; there's prior work on this topic, and it's a, desirable, feature (more!).

As the interesting thread with the ack team describes, I think there's no need to ~Perfectly Parse All Languages Ever~. ag is primarily a tool for quickly finding something; it's excellent at that, and by far the single biggest problem I have using ag is, invariably, that 3/4s of my results are in comments. I suspect that what they're calling an “80/20” solution is totally possible while remaining within ag's goals; am I super-off-base in thinking so?

My suggestion: Implement, at first, as a --[no-]comments flag, disabled by default, with the simplest possible heuristic: ignore matches after // or # on lines. Sure, you might want to both ignore-comments, and get matches after "hi // there" in a line of code, but that's already a bit of an edge case; especially for a feature being used mostly for quick navigation or hunting down code. From there onwards, only implement improvements in the comment-detection when they're asked for: if somebody complains about that, then maybe implement basic within-single-line string-detection, if it's really a problem real people run into often. Similarly, if somebody's using -- in Lua or ! in Fortran, then add those markers (or configurable markers for languages), when somebody actually asks for it.

I think just a simple ‘ignore content after // and #’ flag would be tremendously useful, even with all of its' failings, for a tremendous chunk of your userbase; and should be a feature that's trivial to initially implement. Progressive enhancement, I say!

Thoughts?

ELLIOTTCABLE commented 8 years ago

Extraneous thought: An additionally-useful application of this might be to leave matching-in-comments enabled by default (i.e. the current behaviour / the --comments behaviour); but when --colour is enabled as well, then print // / # and the content after them in a subdued terminal-colour. This side-steps the problems of accuracy in recognizing comments, because the content is still there / displayed, by default; but it's made easier for users to skim over the output and skip irrelevant comment-contents at a glance. (This obviously needs a --color-comments flag to customize the display thereof, as well.)

ggreer / the_silver_searcher

Syntax-aware grep? #543