Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.97k stars 560 forks source link

\N{} incompatibility in 5.12+ #10367

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#74978 (status was 'resolved')

Searchable as RT74978$

p5pRT commented 14 years ago

From tokuhirom@gpath.example.org

Created by tokuhirom@gpath.example.org

following one liner fails with perl 5.12.0.

perl -e 'use charnames "​:full"; /\N{FULLWIDTH LEFT PARENTHESIS}./;print "ok\n";'

Invalid hexadecimal number in \N{U+...} in regex; marked by \<-- HERE in m/\N{U+FF08} \<-- HERE ./ at -e line 1.

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl 5.12.0: Configured by tokuhirom at Wed Apr 28 17:18:47 JST 2010. Summary of my perl5 (revision 5 version 12 subversion 0) configuration: Platform: osname=linux, osvers=2.6.31-17-server, archname=x86_64-linux uname='linux gpath 2.6.31-17-server #54-ubuntu smp thu dec 10 18:06:56 utc 2009 x86_64 gnulinux ' config_args='-d -Dprefix=/usr/local/app/perl-5.12.0/ -Duse64bitint' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.4.1', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.10.1.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.10.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: @INC for perl 5.12.0: /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0/x86_64-linux /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0 /usr/local/app/perl-5.12.0/lib/5.12.0/x86_64-linux /usr/local/app/perl-5.12.0/lib/5.12.0 . Environment for perl 5.12.0: HOME=/home/tokuhirom LANG=ja_JP.UTF-8 LANGUAGE (unset) LC_DATE=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/tokuhirom/share/dotfiles/local/bin/:/home/tokuhirom/share/dotfiles/local/bin/ PERL_AUTOINSTALL=--defaultdeps PERL_BADLANG=0 PERL_CPANM_DEV=1 SHELL=/bin/zsh ```
p5pRT commented 14 years ago

From @tokuhirom

Created by tokuhirom@gmail.com

following one liner works in perl5.10.0\, but it fails with perl 5.12.0

% perl -e 'use charnames "​:full"; /\N{FULLWIDTH LEFT PARENTHESIS}./;print "ok\n";' Invalid hexadecimal number in \N{U+...} in regex; marked by \<-- HERE in m/\N{U+FF08} \<-- HERE ./ at -e line 1.

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl 5.12.0: Configured by tokuhirom at Wed Apr 28 17:18:47 JST 2010. Summary of my perl5 (revision 5 version 12 subversion 0) configuration: Platform: osname=linux, osvers=2.6.31-17-server, archname=x86_64-linux uname='linux gpath 2.6.31-17-server #54-ubuntu smp thu dec 10 18:06:56 utc 2009 x86_64 gnulinux ' config_args='-d -Dprefix=/usr/local/app/perl-5.12.0/ -Duse64bitint' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.4.1', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.10.1.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.10.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: @INC for perl 5.12.0: /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0/x86_64-linux /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0 /usr/local/app/perl-5.12.0/lib/5.12.0/x86_64-linux /usr/local/app/perl-5.12.0/lib/5.12.0 . Environment for perl 5.12.0: HOME=/home/tokuhirom LANG=ja_JP.UTF-8 LANGUAGE (unset) LC_DATE=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/tokuhirom/share/dotfiles/local/bin/:/home/tokuhirom/share/dotfiles/local/bin/ PERL_AUTOINSTALL=--defaultdeps PERL_BADLANG=0 PERL_CPANM_DEV=1 SHELL=/bin/zsh ```
p5pRT commented 14 years ago

From @khwilliamson

Tokuhiro Matsuno (via RT) wrote​:

# New Ticket Created by "Tokuhiro Matsuno" # Please include the string​: [perl #74982] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=74982 >

This is a bug report for perl from tokuhirom@​gmail.com\, generated with the help of perlbug 1.39 running under perl 5.12.0.

----------------------------------------------------------------- [Please describe your issue here]

following one liner works in perl5.10.0\, but it fails with perl 5.12.0

% perl -e 'use charnames "​:full"; /\N{FULLWIDTH LEFT PARENTHESIS}./;print "ok\n";' Invalid hexadecimal number in \N{U+...} in regex; marked by \<-- HERE in m/\N{U+FF08} \<-- HERE ./ at -e line 1.

[Please do not change anything below this line] -----------------------------------------------------------------

Thanks for the bug report. I was the one who introduced the bug. I'm sorry. I will have a patch available today. In the meantime\, the problem turns out to be the period just after the '}'. If you remove that\, it will work.

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

From @obra

Thanks for the bug report. I was the one who introduced the bug. I'm sorry. I will have a patch available today. In the meantime\, the problem turns out to be the period just after the '}'. If you remove that\, it will work.

I'm going to hold 5.12.1 RC1 for this.

Best\, Jesse --

p5pRT commented 14 years ago

From @khwilliamson

Attached is a minimal patch to fix this. There are two other commits that add comments to a .t file so that someone later won't have to work as hard as I did at finding where to put the tests for something similar.

p5pRT commented 14 years ago

From @khwilliamson

0001-Comment-where-to-find-file-s-format.patch ```diff From ce65c312b89d6f851ca46d24719e07bce288ee99 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 8 May 2010 13:12:53 -0600 Subject: [PATCH] Comment where to find file's format --- t/re/re_tests | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/t/re/re_tests b/t/re/re_tests index 1807ffc..b7471d9 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1,5 +1,5 @@ # This stops me getting screenfulls of syntax errors every time I accidentally -# run this file via a shell glob +# run this file via a shell glob. Format of this file is given in regexp.t __END__ abc abc y $& abc abc abc y $-[0] 0 -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-Note-in-comment-that-many-N-.-tests-won-t-work-h.patch ```diff From 50e44d09a829eed4eeabf9ce78d3374a5f785d4f Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 8 May 2010 13:38:27 -0600 Subject: [PATCH] Note in comment that many \N{...} tests won't work here --- t/re/re_tests | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/t/re/re_tests b/t/re/re_tests index b7471d9..c550b5a 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1,5 +1,7 @@ # This stops me getting screenfulls of syntax errors every time I accidentally # run this file via a shell glob. Format of this file is given in regexp.t +# Can't use \N{VALID NAME TEST} here because need 'use charnames'; but can use +# \N{U+valid} here. __END__ abc abc y $& abc abc abc y $-[0] 0 -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0003-PATCH-perl-74978-dot-after-breaks-N.patch ```diff From 1bb86a94fea493dd6213e60ed8e19b51b8ceea0c Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 8 May 2010 14:06:10 -0600 Subject: [PATCH] PATCH [perl #74978] dot after } breaks \N{} The problem is that a dot can come between the braces in \N{foo.bar}, but when searching for it, I didn't stop looking at the right brace, so it generated an error inappropriately. This is essentially a minimum patch; efficiency could be improved slightly with a little more work. --- regcomp.c | 8 +++----- t/re/pat.t | 8 +++++++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/regcomp.c b/regcomp.c index f665f0b..be5acdb 100644 --- a/regcomp.c +++ b/regcomp.c @@ -6762,11 +6762,10 @@ S_reg_namedseq(pTHX_ RExC_state_t *pRExC_state, UV *valuep, I32 *flagp) | PERL_SCAN_DISALLOW_PREFIX | (SIZE_ONLY ? PERL_SCAN_SILENT_ILLDIGIT : 0); - char * endchar = strchr(RExC_parse, '.'); - if (endchar) { + char * endchar = RExC_parse + strcspn(RExC_parse, ".}"); + if (endchar < endbrace) { ckWARNreg(endchar, "Using just the first character returned by \\N{} in character class"); } - else endchar = endbrace; length_of_hex = (STRLEN)(endchar - RExC_parse); *valuep = grok_hex(RExC_parse, &length_of_hex, &flags, NULL); @@ -6817,8 +6816,7 @@ S_reg_namedseq(pTHX_ RExC_state_t *pRExC_state, UV *valuep, I32 *flagp) /* Code points are separated by dots. If none, there is only one * code point, and is terminated by the brace */ - endchar = strchr(RExC_parse, '.'); - if (! endchar) endchar = endbrace; + endchar = RExC_parse + strcspn(RExC_parse, ".}"); /* The values are Unicode even on EBCDIC machines */ length_of_hex = (STRLEN)(endchar - RExC_parse); diff --git a/t/re/pat.t b/t/re/pat.t index 40ae52e..7b9594c 100644 --- a/t/re/pat.t +++ b/t/re/pat.t @@ -23,7 +23,7 @@ BEGIN { } -plan tests => 297; # Update this when adding/deleting tests. +plan tests => 299; # Update this when adding/deleting tests. run_tests() unless caller; @@ -987,6 +987,12 @@ sub run_tests { ok "abbbbc" =~ m/\N{3,4}/ && $& eq "abbb", '"abbbbc" =~ m/\N{3,4}/ && $& eq "abbb"'; } + { + use charnames ":full"; + local $Message = '[perl #74982] Period coming after \N{}'; + ok "\x{ff08}." =~ m/\N{FULLWIDTH LEFT PARENTHESIS}./ && $& eq "\x{ff08}."; + ok "\x{ff08}." =~ m/[\N{FULLWIDTH LEFT PARENTHESIS}]./ && $& eq "\x{ff08}."; + } } # End of sub run_tests -- 1.5.6.3 ```
p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

From @obra

On Sat\, May 08\, 2010 at 02​:18​:33PM -0600\, karl williamson wrote​:

Attached is a minimal patch to fix this. There are two other commits that add comments to a .t file so that someone later won't have to work as hard as I did at finding where to put the tests for something similar.

Thanks. Applied. +1 to backport the code patch for .1.

-Jesse

p5pRT commented 14 years ago

From @xdg

On Sat\, May 8\, 2010 at 5​:34 PM\, Jesse Vincent \jesse@&#8203;fsck\.com wrote​:

On Sat\, May 08\, 2010 at 02​:18​:33PM -0600\, karl williamson wrote​:

Attached is a minimal patch to fix this.  There are two other commits that add comments to a .t file so that someone later won't have to work as hard as I did at finding where to put the tests for something similar.

Thanks. Applied.  +1 to backport the code patch for .1.

-Jesse

agreed. +1 to backport

p5pRT commented 14 years ago

@rgs - Status changed from 'open' to 'resolved'