beyondgrep / ack2

**ack 2 is no longer being maintained. ack 3 is the latest version.**
https://github.com/beyondgrep/ack3/
Other
1.48k stars 138 forks source link

Support Multiple, Newline-Separated Patterns #646

Open DabeDotCom opened 7 years ago

DabeDotCom commented 7 years ago

Per POSIX 1003.1:

The pattern_list's value shall consist of one or more patterns separated by <newline> characters

So, for example, you could do:

bash%  ls -l /etc/ | grep $'passwd\ngroup'

Now you can do the same with ack:

bash% ls -l /etc/ | ack $'passwd\ngroup`
bash% ls -l /etc/ | ack $'p(asswd|rofile)\ns(hells|udoers)'

This is especially handy with command substitution [backticks], as it means you don't have to manually perform any alice|bob|carol shenanigans; the alternation is automatically performed for you:

bash% ack /etc --match "`/bin/ls /home/`" 
DabeDotCom commented 7 years ago

(Personally, I think this is a more useful fix than #522)

petdance commented 7 years ago

Thanks for this, Dabe. Interesting idea that I think is worth talking about. The idea of multiple regexes is not a new one. If we do adopt it, it will go into ack3, not ack2. ack2 is not getting any new development. ack3 is getting ready for its first alpha in the next week or two. If we do add this, it would be a great thing to have on the "new in ack 3!" features list.

Some thoughts, in no particular order:

n1vux commented 7 years ago

Ack already has ability to do multiple patterns with ls -l /etc/ | ack 'passwd|group' .

We have tried to maintain some degree of plug-compatibility with (f)grep, but since the RE language is different, it can't be exact. An ability to read a bunch of -wQ words from a file would not be bad.

ways a simple join( '|', @pat ) could go wrong

DabeDotCom commented 7 years ago

I didn't even realize there was an ack3. Mea culpa

I'm all for a more elegant solution than join( '|', @pat )... (I'm sure you could end up in backtracking hell.)

But I haven't looked to see what, e.g., GNU grep does to maximize efficiency.

In terms of measuring speed, though, it's kind of tough, since currently there's NO way to include multiple regexes. (So does that make this "infinitely" faster? «wink»)

petdance commented 7 years ago

I didn't even realize there was an ack3. Mea culpa

I haven't really announced it, and there's not really a central place to announce it. :-/

petdance commented 7 years ago

@DabeDotCom What prompted you to want this bit of functionality?

DabeDotCom commented 7 years ago

Ack already has ability to do multiple patterns with ls -l /etc/ | ack 'passwd|group' .

Yup, I acknowledged that.

But this is just gross, IMHO:

ack /etc --match "`/bin/ls /home/ | tr '\n' '|' | sed -e 's/|$//'`"

AND it opens you up to injection bugs: cd /home ; touch "|" — now you've included an empty regex. In the PR, one of the tests in t/split-newline.t verifies that -Q Does What I Mean; it quotes each regex individually, not the "meta" regex characters that glue each pattern together:

ack -Q /etc --match "`/bin/ls /home/`"
DabeDotCom commented 7 years ago

@petdance I wanted to be able to do:

ls -ltr | ack --passthru "`grep -l SOME_TAG *`"

to get a list of all files, sorted by date, highlighting the ones that matched.

(Admittedly, there are still some pathological cases — like if a filename contains a newline, e.g. But POSIX grep would fail that, too...)

n1vux commented 7 years ago

We have tried to maintain some degree of plug-compatibility with (f)grep, but since the RE language is different, it can't be exact. In this case, we do it as egrep does, so we needn't do it as grep does.

While not supporting this GNUism may violate least-surprise for folks who discovered this quirk in GNU grep before finding egrep, I don't see that as a driving reason to change.

Encouraging embedding newlines in cmd line arguments doesn't make me happy either.

If anything, if we wish to provide a $pat = join( '|', @pat ) service, rather than splitting on \n, i could support multiple occurrences of --regex=foo --regex=bar as that could simplify a script using ack (and give it a choice of doing its own in-fix join or generating 1 to N "--regex='$pat'" ).

(However, an ability to read a bunch of -wQ or-Qwords from a file, as fgrep -f words does, would be very good, and that would also be a join. While we could accept that without an implied -Q, hewing to the fgrep -f and saying they're words and thus -Qw seems simplest and least-surprise-y to me.)

ways a simple join( '|', @pat ) could go wrong For one, any | in the @pat needs escaping; is that on the user or on us? (One reason to prefer -Q or -Qw for hypothetical --file-of-patterns aka --fgrep-f )

But this is just gross, IMHO:

ack /etc --match "`/bin/ls /home/ | tr '\n' '|' | sed -e 's/|$//'`"

AND it opens you up to injection bugs: cd /home ; touch "|" — now you've included an empty regex.

That isn't ack doing that, that's the calling script trusting its input. Since we allow '|' in patterns, unless your calling script escapes all metachars in /bin/ls output, touch '|' will give you two (not one) empty alternatives, even if \n is taken as an alias for or.

In the PR, one of the tests in t/split-newline.t verifies that -Q Does What I Mean; it quotes each regex individually, not the "meta" regex characters that glue each pattern together:

Nice touch, i like that .

DabeDotCom commented 7 years ago

For one, any | in the @pat needs escaping; is that on the user or on us?

@n1vux Actually, it wouldn't. From t/split-newline.t:

MULTIPLE_REGEXES: {
    my @expected = split( /\n/, <<'EOF' );
I was playin' soft while Bobbie sang the blues
From the Kentucky coal mines to the California sun
Bobbie shared the secrets of my soul
Bobbie baby kept me from the cold
One day up near Salinas, Lord, I let her slip away
EOF

    my @files = qw( t/text/me-and-bobbie-mcgee.txt );
    my @results = run_ack( "co(?:ld|al)\nso(?:ft|ul)\nSalinas", @files );

    lists_match( \@results, \@expected, 'Multiple regexes' );
}

There, I WANT the alternation...

i could support multiple occurrences of --regex=foo --regex=bar

I had drafted a different issue for that, too! «grin»

POSIX grep allows for multiple -e flags. (Though it doesn't address the backtick use case)

(However, an ability to read a bunch of -wQ or-Qwords from a file, as fgrep -f words does, would be very good, and that would also be a join. While we could accept that without an implied -Q, hewing to the fgrep -f and saying they're words and thus -Qw seems simplest and least-surprise-y to me.)

Yet another issue I started to open was to include --regex-from=<file> à la grep -f.

This would let me use process substitution to do:

bash% ls -ltr | ack -Q --passthru --pattern-from <(grep -l SOME_TAG *)
petdance commented 7 years ago

Yet another issue I started to open was to include --regex-from= à la grep -f.

That one I'm much less keen to pursue.

Aside: If you haven't looked at DESIGN.md in the ack3 repo, take a look at that.

n1vux commented 7 years ago

I wanted to be able to do:

ls -ltr | ack --passthrough "`grep -l SOME_TAG *`"

to get a list of all files, sorted by date, highlighting the ones that matched.

Is there a reason Gnu Grep doesn't work here? It has got highlighting now. Does it lack a --passthrough equivalent?

(On my system, (cd /etc; grep -l $USER *) generates a lot of grep: X11: Is a directory and $fn: cannot open file for reading errors mess ... i hope your real application is in a leaf directory.)

Ok, we like compound queries with ls; Ack3 cookbook has recommended compound queries. Ack has -f, -g, -l, and --passthrough modes, but they don't combine to do exactly this.

I know I can list just the matching ones in date order with full ls -lart thusly ...

ack -l $USER /etc 2>/dev/null | xargs ls -lart

(use the 0 options if afflicted with spaces in path/names) but that doesn't get you highlighted in context of the non-highlighteds.

I will take this as a challenge to consider alternative solutions for Ack3 Cookbook that are less ugly than the tr sed example as well as variations that get close with elegance.

(Although if GnuGrep or some other tool can do some "it" better than Ack, we are willing in the Cookbook section to recommend Gnu Grep for doing "it".)

n1vux commented 7 years ago

This would let me use process substitution to do:

I love process substitution filehandles in recent bash!

DabeDotCom commented 7 years ago

Yet another issue I started to open was to include --regex-from= à la grep -f.

That one I'm much less keen to pursue. Aside: If you haven't looked at DESIGN.md in the ack3 repo, take a look at that.

Ironically, ignoring grep -f goes against the one-and-only Guiding principle: When deciding on ack's behavior, try to be grep-compatible if possible.

(And BTW — I'm not trying to stir the pot, here! I can't even begin to express how appreciative I am for you guys' hard work!! I'm just one of those pesky users who's always asking for more features than I'm capable of actually implementing myself... «sigh»)

DabeDotCom commented 7 years ago

Is there a reason Gnu Grep doesn't work here? It has got highlighting now. Does it lack a --passthrough equivalent?

Correct.

(And as a very trivial nit: ack spells it --passthru. I was bitten by that earlier, too! ag happens to include an alias for --passthrough, which is a little more "liberal in what you accept" ... Postel's Law)

n1vux commented 7 years ago

"liberal in what you accept" ... Postel's Law

Amen. RIP Jon, and his acolyte my mentor MAP also.

so to summarize , No total solutions from the beyondgrep.com Other Tools tab spotted in a quick scan:

Not that i'm convinced that this is a usecase that must be possible without writing some real code, but it is so tantalizingly close to what we can do with just ack/grep/ls/bash while being a little outlandish that as a Cookbook challenge i'm game to explore how close i can get!

Thanks for the example. I expect i can do it somewhat less inelegantly than your sed, tr bad example (that sort of ugly exactly is why i learned AWK and then Perl4+5 when i first got on Unix back in %DECADE%CENSORED%), it'll be a lovely bit of chrome for the Ack3 cookbook. Stay tuned, I'll try to post my improvement back here but it should be in Ack3 docs when released.

n1vux commented 6 years ago

The best I've got so far is ....

Ack doesn't have "--fgrep-f" nor does it accept newlines as OR otherwise, as newer Grep does. But Grep has no "--passthru". Requestor would like to view the whole files but highlight any of several words in each, which needs both. Workaround is ugly:

ack /etc --match "`/bin/ls /home/ | tr '\n' '|' | sed -e 's/|$//'`"

Longer but more readable, use $() instead of ` and Perl instead of tr, sed, which allows us to insert|` between as needed without an extra to be removed:

ack /etc --match $(/bin/ls /home/ | perl  -E '@u=<>; chomp for @u; say join q(|), @u' )

or invert the "ls",

ack /etc --match $( perl -E '@u=`ls /home/`; chomp for @u; say join q(|), @u' )

or keep it in one process,

ack /etc --match $( perl -E 'chdir q(/home/); @u=<*>; chomp for @u; say join q(|), @u' )