Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

FileCheck .* is too greedy #29175

Open Quuxplusone opened 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR30198
Status NEW
Importance P normal
Reported by Paul Robinson (paul_robinson@playstation.sony.com)
Reported on 2016-08-30 12:40:00 -0700
Last modified on 2019-08-06 06:16:01 -0700
Version trunk
Hardware PC Windows NT
CC daniel@zuster.org, hans@chromium.org, llvm-bugs@lists.llvm.org, modocache@gmail.com, rnk@google.com, thomas.preudhomme@celest.fr
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
In r280057, Richard Smith made this change to a test:

--- cfe/trunk/test/Driver/modules-ts.cpp (original)
+++ cfe/trunk/test/Driver/modules-ts.cpp Tue Aug 30 00:14:38 2016
@@ -23,8 +23,7 @@
 // CHECK-USE: -cc1
 // CHECK-USE-SAME: -emit-obj
 // CHECK-USE-SAME: -fmodule-file={{.*}}.pcm
-// CHECK-USE-SAME: -o {{.*}}.o
-// CHECK-USE-SAME: -x c++
+// CHECK-USE-SAME: -o {{.*}}.o {{.*}}-x c++
 // CHECK-USE-SAME: modules-ts.cpp

The reason is that the {{.*}}.o was matching too much text, and
indeed the following test case fails:

$ cat foo.txt
CHECK:  -o {{.*}}.o
CHECK-SAME: -x c++
$ echo '-o blah.o -x c++ llvm.org' | FileCheck foo.txt
foo.txt:2:13: error: expected string not found in input
CHECK-SAME: -x c++
            ^
<stdin>:1:24: note: scanning from here
-o blah.o -x c++ llvm.org
                       ^

I think FileCheck's '*' regex operator should not be greedy,
i.e. match the minimum necessary not the maximum.
Quuxplusone commented 8 years ago
* is greedy by default in all regex variants I'm aware of, so making it non-
greedy might create more confusion than it removes though.
Quuxplusone commented 8 years ago
(In reply to comment #1)
> * is greedy by default in all regex variants I'm aware of, so making it
> non-greedy might create more confusion than it removes though.

Agreed, I came here to say this.

I could imagine combining all CHECK-SAME directives into a single regex, and
solving the problem that way, but it would hurt test debug-ability.
Quuxplusone commented 8 years ago
The problem with greedy .* is that it's easy to write check patterns
that will match some environmental thing that you're not expecting.
(You can ask jlebar about inadvertent matches on 'bar' sometime.)
Which is exactly what Richard ran into.

Currently CHECK-SAME is done the same way as CHECK-NEXT, i.e. run the
check and then count newlines since the previous match (expecting
exactly one for -NEXT and zero for -SAME).  Collecting them up to form
a mega-pattern (pasted together with {{.*}}) would better match the
intended purpose, which is just to break up long check lines to form
more human-friendly sequences.  I'm not able to find a regex that
exactly models the behavior of CHECK-SAME, which I think is a problem.
Quuxplusone commented 5 years ago
(In reply to Paul Robinson from comment #3)
> The problem with greedy .* is that it's easy to write check patterns
> that will match some environmental thing that you're not expecting.
> (You can ask jlebar about inadvertent matches on 'bar' sometime.)
> Which is exactly what Richard ran into.
>
> Currently CHECK-SAME is done the same way as CHECK-NEXT, i.e. run the
> check and then count newlines since the previous match (expecting
> exactly one for -NEXT and zero for -SAME).  Collecting them up to form
> a mega-pattern (pasted together with {{.*}}) would better match the
> intended purpose, which is just to break up long check lines to form
> more human-friendly sequences.  I'm not able to find a regex that
> exactly models the behavior of CHECK-SAME, which I think is a problem.

Wouldn't doing the following work and provide ok debuggability:

combined_pattern=""
Optional<unsigned> match_line = None
FOR pattern FROM <CHECK> TO last consecutive <CHECK-SAME> do:
  combined_pattern.append(pattern);
  if Regex(combined_pattern, &MatchInfo)
    return error;
  IF match_line AND match_line != <match line for last match>
    return error;

It would fail on the first CHECK-SAME that either does not match or leads the
match to be on a different line.