PCRE2Project / pcre2

PCRE2 development is now based here.
Other
921 stars 194 forks source link

A\z matches on A<invalid-UTF8>B with PCRE2_UTF|PCRE2_MATCH_INVALID_UTF and no-JIT #343

Closed stephane-chazelas closed 1 year ago

stephane-chazelas commented 1 year ago
$ printf 'A\200B\n' | pcre2grep --no-jit -qU 'A\z' && echo match
match

(unexpected as the subject doesn't end in A).

That doesn't happen when pcre2grep is built with JIT support (--enable-jit) and without that --no-jit.

It doesn't happen with $ in place of \z.

Initially found in https://github.com/raforg/rawhide/issues/2

Reproduced with pcre2 10.42 on Debian GNU/Linux amd64 sid/unstable.

PhilipHazel commented 1 year ago

Thanks for the report. Fixed in 05206d6.

addisoncrump commented 1 year ago

This change is incomplete for noteol:

$ ./pcre2test -jit
PCRE2 version 10.43-DEV 2023-04-14 (8-bit)
  re> /a\z/    
data> a
 0: a
data> a\=noteol
 0: a
data> a\=no_jit
 0: a
data> a\=no_jit,noteol
No match

Maybe a JIT issue?

zherczeg commented 1 year ago

Why? pcre2.txt says:

PCRE2_NOTEOL This option affects only the behaviour of the dollar metacharacter. It does not affect \Z or \z.

PhilipHazel commented 1 year ago

You are quite right. I should have read my own documentation! I will fix this.