beyondgrep / ack3

ack is a grep-like search tool optimized for source code.
https://beyondgrep.com/
Other
699 stars 66 forks source link

Allow regexes to support \Q (and friends) #323

Open rkleemann opened 3 years ago

rkleemann commented 3 years ago

I've run into an issue where I could use \Q support, but when trying to use it I get the follow error:

$ ack '\s+\Q'"$MYVAR"'\E\s+'
ack: Invalid regex '\s+\QFoo!\E\s+'
Regex: \s+\QFoo!\E\s+
           ^---HERE Unrecognized escape \Q passed through in regex

In this situation, -Q will not work, as I need some regex around $MYVAR to get exactly what I'm looking for.

rkleemann commented 3 years ago

As a workaround, I've used another call to Perl to do what I wanted:

$ack '\s+'$( perl -e 'print quotemeta shift' "$MYVAR" )'\s+'

But that's definitely a hack.

petdance commented 3 years ago

PCRE allows \Q and \E

https://www.pcre.org/original/doc/html/pcrepattern.html

petdance commented 3 years ago

Implementation note: If we allow \Q and \E we will have to update is_lowercase as well.

rkleemann commented 3 years ago

It appears that this might be easier said than done, as \Q (and friends) are done at compile time, as mentioned in perlop: https://perldoc.perl.org/perlop#Gory-details-of-parsing-quoted-constructs

But in the section "parsing regular expressions", mentions the following (emphasis mine):

Previous steps were performed during the compilation of Perl code, but this one happens at run time

petdance commented 3 years ago

That compile-time vs. run-time explains everything. I wonder if doing the qr// in an eval block would be "compile-time" enough. (Not that we want to use eval here after we worked so hard to get rid of it after ack 2)

n1vux commented 3 years ago

qr// is compile time for RE things but not for qq() interpolation things, which are Perl language and effectively pre-processor for qr// compile.

\Q\E kinda cross the streams because they protect RE languag chars at Perl-lang qq compile time.

Not supporting \Q\E found at qr// compile time is arguably a spec-level bug in Perl, as it fails to DWIM.

PCRE chooses to treat \Q\E as if part of RE-lang as a convenience to embedding languages since even if they have something like \Q\E theirs won't be clued to what PCRE needs quoted.

n1vux commented 3 years ago

OP Bob provides minimalist test case

$ perl -wE 'my $re = q!\Qfoo\E!; say "foo" =~ $re;'
Unrecognized escape \Q passed through in regex; marked by <-- HERE in m/\Q <-- HERE foo\E/ at -e line 1.
Unrecognized escape \E passed through in regex; marked by <-- HERE in m/\Qfoo\E <-- HERE / at -e line 1.

-w aha. ack forces that by escalating warnings to die, and catching die. (Which Andy very nicely then reformats to be readable and useful.)

n1vux commented 3 years ago

That the RE-special-char-understanding\Q\E escapes are being executed at the same Perl-lang compile (eval "" but not eval {} time) as \L\F\U and not in RE-lang compile (qr{}) is arguably a layering fudge in the Perl5 spec & implementation.

It would be a DWIM / Principle of Least Surprise feature for Ack to make at least \Q\E work as if we were doing eval "" (which we of course eschew for security reasons), and if we're going to do that, we might as well DWIM \L\F\U as well.

It would be ironic for Ack to do this to make Ack and Perl more PCRE compatible :-D but if it's the right thing.