Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

Simple pattern causes perl panic #22094

Closed PhilipHazel closed 1 month ago

PhilipHazel commented 1 month ago
$ perl -e 'if ("xx" =~ /[^\S\W]{6}/) { print "yes >$&<\n"; } else { print "no \n"; }'
panic: regrepeat() called with unrecognized node type 99='OPFAIL' at -e line 1.

**Description**
<!-- A clear and concise description of what the bug is. -->
I think the above output says it all.

**Steps to Reproduce**
<!-- A one-liner or script to reproduce the issue. -->
See above.

**Expected behavior**
<!-- A clear and concise description of what you expected to happen. -->
It should output "no".

**Perl configuration**
<!-- Please paste `perl -V` output just below. -->
```Summary of my perl5 (revision 5 version 38 subversion 2) configuration:

  Platform:
    osname=linux
    osvers=5.12.15-arch1-1
    archname=x86_64-linux-thread-multi
    uname='archlinux'
    config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.38/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.38/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.38/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dlddlflags=-shared -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs -flto=auto -Dldflags=-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs -flto=auto -Dloclibpth=/usr/lib/db5.3 -Dlocincpth=/usr/include/db5.3'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/include/db5.3 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto'

    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/include/db5.3'
    ccversion=''
    gccversion='13.2.1 20230801'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags ='-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-z,pack-relative-relocs -flto=auto -fstack-protector-strong -L/usr/lib/db5.3'
    libpth=/usr/local/lib /usr/lib /usr/lib/db5.3
    libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib/libc.so.6
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.39'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.38/core_perl/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-z,pack-relative-relocs -flto=auto -L/usr/lib/db5.3 -fstack-protector-strong'
Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  Compiled at Feb 11 2024 19:15:41
  %ENV:
    PERLBREW_ROOT="/source/PerlBlead"
  @INC:
    /usr/lib/perl5/5.38/site_perl
    /usr/share/perl5/site_perl
    /usr/lib/perl5/5.38/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib/perl5/5.38/core_perl
    /usr/share/perl5/core_perl
leonerd commented 1 month ago

FYI, it changed behavior somewhere between 5.32 and 5.34:

  --- perl5.30.3 --- 
no 

  --- perl5.32.1 --- 
no 

  --- perl5.34.3 --- 
panic: regrepeat() called with unrecognized node type 98='OPFAIL' at -e line 1.

  --- perl5.36.3 --- 
panic: regrepeat() called with unrecognized node type 98='OPFAIL' at -e line 1.

  --- perl5.38.2 --- 
panic: regrepeat() called with unrecognized node type 99='OPFAIL' at -e line 1.

  --- perl5.39.9 --- 
panic: regrepeat() called with unrecognized node type 99='OPFAIL' at -e line 1.
mauke commented 1 month ago

It bisects to 4f0d304ec835f478a4dd9b4ab7af01f5b826c6d7.

bad - non-zero exit from ./perl -Ilib -e "" =~ /[^\S\W]{6}/
4f0d304ec835f478a4dd9b4ab7af01f5b826c6d7 is the first bad commit
commit 4f0d304ec835f478a4dd9b4ab7af01f5b826c6d7
Author: Hugo van der Sanden <hv@crypt.org>
Date:   Tue Apr 21 11:50:18 2020 +0100

    regexec: disallow zero-width nodes in regrepeat

    GH #17594: the logic here expects the node to have width 1 (except for
    LNBREAK), it is not expected to do the right thing on zero-width nodes.

 regexec.c | 19 -------------------
 1 file changed, 19 deletions(-)
demerphq commented 1 month ago

This is very weird. There shouldn't be a zero width node from this pattern.

mauke commented 1 month ago

I think this is the bug (from regcomp.c):

    /* All possible optimizations below still have these characteristics.
     * (Multi-char folds aren't SIMPLE, but they don't get this far in this
     * routine) */
    *flagp |= HASWIDTH|SIMPLE;

[^\W\S] is an empty set, so the optimizer rewrites it to OPFAIL, which is no longer SIMPLE.

demerphq commented 1 month ago

Oh, i see, it is the empty set because not-space includes word-chars, and not-word includes space chars. So [\S\W] includes all codepoints, thus the inverse contains none. Nice. I didnt catch on to that at first, i was wondering why it doesnt match "word and space chars", eg, why it wasnt the same as /[\s\w]/.

So I guess the question here is, should OPFAIL be handled specially in regrepeat? It is not zero width in the same way that most other zero width pattern are, as it always fails, so it wouldnt matter if it doesn't match 1 character.

I almost think that treating OPFAIL as simple is fine as you can say that it could match anything (including 1 character) if it were to match, but it always fails so it never matches anything. (Yes that is a twisted thought, but it also makes sense at the same time.)

I suspect we should just tweak 4f0d304ec835f478a4dd9b4ab7af01f5b826c6d7 by keeping OPFAIL in the list. It is not the same as a truly zerowidth assertion like \b or ^ or $ or what not.