Unable to match pattern

harendra-kumar commented 3 years ago

I have a benchmark named All.Unicode.Stream/o-1-space.ungroup-group.US.unlines . S.splitOnSuffix ([Word8]) (1/10). I am able to substring match the benchmark as follows:

Unicode.Stream -p "/All.Unicode.Stream\/o-1-space.ungroup-group.US.unlines . S.splitOnSuffix ([Word8]) (1\/10)/"
All
  Unicode.Stream/o-1-space
    ungroup-group
      US.unlines . S.splitOnSuffix ([Word8]) (1/10): OK (0.55s)
        182 ms ±  17 ms

All 1 tests passed (0.55s)

But I need an exact match instead of substring match so I use an awk pattern like this:

Unicode.Stream -p '$0 == "All.Unicode.Stream\/o-1-space.ungroup-group.US.unlines . S.splitOnSuffix ([Word8]) (1\/10)"'
option -p: Could not parse pattern

Usage: Unicode.Stream [-p|--pattern PATTERN] [-t|--timeout DURATION]
                      [-l|--list-tests] [-j|--num-threads NUMBER] [-q|--quiet]
                      [--hide-successes] [--color never|always|auto]
                      [--ansi-tricks ARG] [--baseline ARG] [--csv ARG]
                      [--svg ARG] [--stdev ARG] [--fail-if-slower ARG]
                      [--fail-if-faster ARG]

It fails with option -p: Could not parse pattern. Can someone tell me what's wrong with this? Is there any way some debug information can be printed which can tell why it could not parse pattern?

Bodigrim commented 3 years ago

No need to escape slashes:

> MYBENCH=`cabal list-bin streamly-benchmarks:Unicode.Stream`
> $MYBENCH -p '$0 == "All.Unicode.Stream/o-1-space.ungroup-group.unlines . splitOnSuffix ([Word8]) (1/10)"'
Using small input file: benchmark-tmp/in-10MB.txt
Using big input file: benchmark-tmp/in-100MB.txt
Using output file: benchmark-tmp/out.txt
All
  Unicode.Stream/o-1-space
    ungroup-group
      unlines . splitOnSuffix ([Word8]) (1/10): OK (2.13s)
        299 ms ±  26 ms

In a more complex cases a sed incantation for correct escaping could be useful:

$MYBENCH -l | sed -e 's/[\"]/\\\\\\&/g' | while read name; do $MYBENCH -p '$0 == "'$name'"'; done

Parsing patterns is completely up to tasty. It would be nice to get a more specific parsing error indeed.

harendra-kumar commented 3 years ago

It works if the benchmark name is specified directly on the command line as you did. But it does not work if the benchmark name is expanded from a shell variable. For example:

$ MYBENCH=`cabal list-bin streamly-benchmarks:Unicode.Stream`
$ BENCH_NAME="All.Unicode.Stream/o-1-space.ungroup-group.unlines . splitOnSuffix ([Word8]) (1/10)"
$ $MYBENCH -p '$0 == "'$BENCH_NAME'"'
Using small input file: benchmark-tmp/in-10MB.txt
Using big input file: benchmark-tmp/in-100MB.txt
Using output file: benchmark-tmp/out.txt
option -p: Could not parse pattern

Usage: Unicode.Stream [-p|--pattern PATTERN] [-t|--timeout DURATION]
                      [-l|--list-tests] [-j|--num-threads NUMBER] [-q|--quiet]
                      [--hide-successes] [--color never|always|auto]
                      [--ansi-tricks ARG] [--baseline ARG] [--csv ARG]
                      [--svg ARG] [--stdev ARG] [--fail-if-slower ARG]
                      [--fail-if-faster ARG]

I tried escaping as well but it does not work. On trial and error I found that if there is a space char in the benchmark name then the "Could not parse pattern" error comes.

harendra-kumar commented 3 years ago

If I remove spaces from the benchmark names then it seems to work fine:

$ BENCH_NAME='All.Unicode.Stream/o-1-space.ungroup-group.US.unlines.S.splitOnSuffix([Word8])(1/10)'
$ $MYBENCH -p '$0 == "'$BENCH_NAME'"'
Using small input file: benchmark-tmp/in-10MB.txt
Using big input file: benchmark-tmp/in-100MB.txt
Using output file: benchmark-tmp/out.txt
All
  Unicode.Stream/o-1-space
    ungroup-group
      US.unlines.S.splitOnSuffix([Word8])(1/10): OK (0.61s)
        192 ms ± 6.5 ms

All 1 tests passed (0.61s)

But this means we will have to change names of hundreds of benchmarks. Is this a bug? Is there a workaround for this?

harendra-kumar commented 3 years ago

Ok, shell escaping is really hard to understand. This one worked even with spaces in the benchmark name:

$ $MYBENCH -p '$0 == "'"$BENCH_NAME"'"'
Using small input file: benchmark-tmp/in-10MB.txt
Using big input file: benchmark-tmp/in-100MB.txt
Using output file: benchmark-tmp/out.txt
All
  Unicode.Stream/o-1-space
    ungroup-group
      US.unlines.S.splitOnSuffix([Word8]) (1/10): OK (0.60s)
        191 ms ± 4.6 ms

All 1 tests passed (0.60s)

harendra-kumar commented 3 years ago

@Bodigrim you may want to change your sed incantation example so that it works even with spaces in the name.

Bodigrim commented 3 years ago

Ah, interesting, https://github.com/Bodigrim/tasty-bench/issues/18#issuecomment-841743779 works in my shell as is, because I happened to use zsh. If I switch to bash, it fails with a parse error. Shell escaping is hard indeed.

Bodigrim commented 3 years ago

Thanks, updated escaping in ab49d86e06c26aa417fe58067d43f748f8bdc0c2.

harendra-kumar commented 3 years ago

Benchmark names containing double quotes are still giving me trouble. It gives the same error "Could not parse pattern". I tried escaping double quotes with a backslash, but did not work.

harendra-kumar commented 3 years ago

Ok, escaping double quotes actually worked, earlier I actually did not use the escaped shell value, my bad.

harendra-kumar commented 3 years ago

The last issue was with benchmark names using backslashes e.g. "splitOn \n". I had to use the shell read -r instead of read (otherwise it eats the backslashes) and then escape the backslashes using another backslash in the pattern.

You may want to use read -r in your example to preserve the backslashes from the input.

Bodigrim commented 3 years ago

Ha, running ShellCheck on my original snippet spots both issues:

In shell.sh line 4:
$MYBENCH -l | sed -e 's/[\"]/\\\\\\&/g' | while read name; do $MYBENCH -p '$0 == "'$name'"'; done
                                                ^--^ SC2162: read without -r will mangle backslashes.
                                                                          ^-------^ SC2016: Expressions don't expand in single quotes, use double quotes for that.
                                                                                   ^---^ SC2086: Double quote to prevent globbing and word splitting.

Did you mean:
$MYBENCH -l | sed -e 's/[\"]/\\\\\\&/g' | while read name; do $MYBENCH -p '$0 == "'"$name"'"'; done

harendra-kumar commented 3 years ago

Nice! I forgot about shellcheck, could have saved me some time.

Bodigrim / tasty-bench

Unable to match pattern #18