Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 530 forks source link

Regexp failure with utf8-flagged string and byte-flagged pattern #9034

Closed p5pRT closed 16 years ago

p5pRT commented 16 years ago

Migrated from rt.perl.org#45605 (status was 'resolved')

Searchable as RT45605$

p5pRT commented 16 years ago

From srezic@cpan.org

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.


The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\,   Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__



Flags​:   category=core   severity=high


Site configuration information for perl 5.10.0​:

Configured by eserte at Wed Sep 19 23​:41​:00 CEST 2007.

Summary of my perl5 (revision 5 version 10 subversion 0 patch 31894) configuration​:   Platform​:   osname=freebsd\, osvers=6.2-release\, archname=amd64-freebsd   uname='freebsd biokovo-amd64.herceg.de 6.2-release freebsd 6.2-release #0​: fri jan 12 08​:32​:24 utc 2007 root@​portnoy.cse.buffalo.edu​:usrobjusrsrcsysgeneric amd64 '   config_args='-Dprefix=/usr/perl5.10.0 -D cc=ccache cc -Dgccansipedantic -de'   hint=recommended\, useposix=true\, d_sigaction=define   useithreads=undef\, usemultiplicity=undef   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=define\, use64bitall=define\, uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='ccache cc'\, ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -I/usr/local/include'\,   optimize='-O2 -pipe'\,   cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -I/usr/local/include'   ccversion=''\, gccversion='3.4.6 [FreeBSD] 20060305'\, gccosandvers=''   intsize=4\, longsize=8\, ptrsize=8\, doublesize=8\, byteorder=12345678   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=8\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='ccache cc'\, ldflags ='-Wl\,-E -L/usr/local/lib'   libpth=/usr/lib /usr/local/lib   libs=-lgdbm -lm -lcrypt -lutil -lc   perllibs=-lm -lcrypt -lutil -lc   libc=\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version=''   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags=' '   cccdlflags='-DPIC -fPIC'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:   DEVEL


@​INC for perl 5.10.0​:   /usr/perl5.10.0/lib/5.10.0/amd64-freebsd   /usr/perl5.10.0/lib/5.10.0   /usr/perl5.10.0/lib/site_perl/5.10.0/amd64-freebsd   /usr/perl5.10.0/lib/site_perl/5.10.0   .


Environment for perl 5.10.0​:   HOME=/home/e/eserte   LANG (unset)   LANGUAGE (unset)   LC_ALL=de_DE.ISO8859-1   LC_CTYPE=de_DE.ISO8859-1   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/usr/X11R6/bin​:/usr/X11/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/gnu/bin​:/usr/TeX/bin​:/usr/local/sbin​:/usr/sbin​:/sbin​:/usr/local/pilot/bin​:/home/e/eserte/bin/FreeBSD​:/home/e/eserte/bin/sh​:/home/e/eserte/bin​:/usr/X386/bin​:/usr/games​:/home/e/eserte/devel   PERL_BADLANG (unset)   PERL_HTML_DISPLAY_CLASS=HTML​::Display​::Mozilla   SHELL=/bin/tcsh

p5pRT commented 16 years ago

From @demerphq

On 9/20/07\, via RT srezic @​ cpan. org \perlbug\-followup@​perl\.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I dont have a blead handy right now to test with\, could someone please send me the output of this with a

use re Debug=>'ALL';

right before the warn statement.

Cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Thursday 20 September 2007 23​:44​:46 srezic@​cpan.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I don't see "use utf8;" in your example\, so\, in what encoding is the script? Likewise\, that means\, in what encoding is the $string and in what is the regexp?

All the best\,

Tels

-- Signed on Fri Sep 21 12​:24​:58 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"Most people\, I think\, don't even know what a rootkit is\, so why should they care about it?"

  -- Thomas Hesse\, President of Sony BMG's global digital business division\, 2005.

p5pRT commented 16 years ago

From @tux

On Fri\, 21 Sep 2007 12​:26​:07 +0200\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 9/20/07\, via RT srezic @​ cpan. org \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I dont have a blead handy right now to test with\, could someone please send me the output of this with a

use re Debug=>'ALL';

right before the warn statement.

http​://www.xs4all.nl/~hmbrand/xx.tgz

ping me on IRC when you have it\, then I can clean up again

-- H.Merijn Brand Amsterdam Perl Mongers (http​://amsterdam.pm.org/) using & porting perl 5.6.2\, 5.8.x\, 5.9.x on HP-UX 10.20\, 11.00\, 11.11\, & 11.23\, SuSE 10.0 & 10.2\, AIX 4.3 & 5.2\, and Cygwin. http​://qa.perl.org http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org   http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 16 years ago

From @eserte

Moin\,

On Thursday 20 September 2007 23​:44​:46 srezic@​cpan.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I don't see "use utf8;" in your example\, so\, in what encoding is the script?

If "use utf8" is missing\, then the script encoding is usually iso-8859-1.

Likewise\, that means\, in what encoding is the $string and in what is the regexp?

The "Ö" is in both cases the byte 0xd6.

Regards\,   Slaven

p5pRT commented 16 years ago

From @eserte

On 9/20/07\, via RT srezic @​ cpan. org \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I dont have a blead handy right now to test with\, could someone please send me the output of this with a

use re Debug=>'ALL';

right before the warn statement.

See the attachment.

Regards\,   Slaven

p5pRT commented 16 years ago

From @eserte

Compiling REx "(?​:%326|Ö)schel" Starting first pass (sizing)

(?​:Ö|&Ouml... | 1| reg
  | | brnc
  | | piec
  | | atom
?​:Ö|Ö... | | reg
Ö|Ö)s... | | brnc
  | | piec
  | | atom
|Ö)sc... | 3| inst - BRANCH Ö)sch... | 4| brnc
  | 5| piec
  | | atom
schel\< | 9| piec
  | | atom
Required size 12 nodes Starting second pass (creation) (?​:Ö|&Ouml... | 1| reg
  | | brnc
  | | piec
  | | atom
?​:Ö|Ö... | | reg
Ö|Ö)s... | | brnc
  | | piec
  | | atom
|Ö)sc... | 3| inst - BRANCH Ö)sch... | 4| brnc
  | 5| piec
  | | atom
)schel\< | 8| tail~ BRANCH (1) -> BRANCH   | 9| tail~ BRANCH (4) -> TAIL   | | tsdy~ EXACT \<\326> (2) -> EXACT   | | ~ attach to TAIL (8) offset to 6   | | tsdy~ EXACT \<Ö> (5) -> EXACT   | | ~ attach to TAIL (8) offset to 3 schel\< | | piec
  | | atom
\< | 12| tail~ BRANCH (1)
  | | ~ BRANCH (4)
  | | ~ TAIL (8) -> EXACT   | 13| tail~ BRANCH (1)
  | | ~ BRANCH (4)
  | | ~ TAIL (8)
  | | ~ EXACT \ (9) -> END first​:> 1​: BRANCH (4) first at 1 Peep​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' 0​:0/0 *Fixed​:'' @​ 0 Float​: '' @​ 0/0 Peep> 1​: BRANCH (4) commit​: Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' -1​:0/0 *Fixed​:'' @​ 0 Float​: '' @​ 0/0   Peep​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0   Peep> 2​: EXACT \<\326> (8)   join> 2​: EXACT \<\326> (8)   skip​:> 8​: TAIL (9)   pre-fin​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0   post-fin​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0   Peep​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0   Peep> 5​: EXACT \<Ö> (8)   join> 5​: EXACT \<Ö> (8)   skip​:> 8​: TAIL (9)   pre-fin​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0   post-fin​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0   Looking for TRIE'able sequences. Tail node is​: EXACT \   - BRANCH (1) -> EXACT \<\326> => EXACT \ (First==-1\,Last==-1\,Cur==1)   - BRANCH (4) -> EXACT \<Ö> => EXACT \ (First==1\,Last==-1\,Cur==4)   - TAIL (8) \   make_trie start==1\, first==1\, last==8\, tail==9 depth=1   TRIE(NATIVE)​: W​:2 C​:7 Uq​:7 Min​:1 Max​:6   Compiling trie using table compiler   Char : \326 & O u m l ;   State+-----------------------------   1 : 2 3 . . . . . ( 2)   2 : . . . . . . . ( 0) W 1   3 : . . 4 . . . . ( 1)   4 : . . . 5 . . . ( 1)   5 : . . . . 6 . . ( 1)   6 : . . . . . 7 . ( 1)   7 : . . . . . . 8 ( 1)   8 : . . . . . . . ( 0) W 2   Alloc​: 57 Orig​: 57 elements\, Final​:7. Savings of %87.72   Statecount​:9 Lasttrans​:8   Char : Match Base Ofs \326 & O u m l ;   State|---------------------------------------------------   # 1| @​ 7 + 0[ 2 3 . . . . .]   # 2| W 1 @​ 0   # 3| @​ 7 + 2[ . . 4 . . . .]   # 4| @​ 7 + 3[ . . . 5 . . .]   # 5| @​ 7 + 4[ . . . . 6 . .]   # 6| @​ 7 + 5[ . . . . . 7 .]   # 7| @​ 7 + 6[ . . . . . . 8]   # 8| W 2 @​ 0   MJD offset​:4 MJD length​:1 Peep​:Pos​:1/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' -1​:0/0 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 Peep> 8​: TAIL (9) Peep​:Pos​:1/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' -1​:0/0 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 Peep> 9​: EXACT \ (12)   join> 9​: EXACT \ (12) pre-fin​:Pos​:6/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'schel' 6​:1/6 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 post-fin​:Pos​:6/5 Flags​: 0x4000 Whilem_c​: 0 Lcp​: 0 Last​:'schel' 6​:1/6 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 Restudying first​:> 1​: TRIE-EXACT\<S​:1/8 W​:2 L​:1/6 C​:7/7>[&\326] (9) Stclass Failtable (9 states)​: 0\, 0\, 1\, 1\, 1\, 1\, 1\, 1\, 1 Peep​:Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' 0​:0/0 *Fixed​:'' @​ 0 Float​: '' @​ 0/0 Peep> 1​: TRIE-EXACT\<S​:1/8 W​:2 L​:1/6 C​:7/7>[&\326] (9) commit​: Pos​:0/0 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' -1​:0/0 *Fixed​:'' @​ 0 Float​: '' @​ 0/0 Peep​:Pos​:1/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'' -1​:0/0 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 Peep> 9​: EXACT \ (12)   join> 9​: EXACT \ (12) pre-fin​:Pos​:6/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'schel' 6​:1/6 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 post-fin​:Pos​:6/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'schel' 6​:1/6 Fixed​:'' @​ 0 *Float​: '' @​ 0/0 commit​: Pos​:6/5 Flags​: 0x0 Whilem_c​: 0 Lcp​: 0 Last​:'schel' -1​:1/6 Fixed​:'' @​ 0 *Float​: 'schel' @​ 1/6 minlen​: 6 r->minlen​:0 Final program​:   1​: TRIE-EXACT\<S​:1/8 W​:2 L​:1/6 C​:7/7>[&\326] (9)   \<\326>   \<Ö>   9​: EXACT \ (12)   12​: END (0) floating "schel" at 1..6 (checking floating) stclass AHOCORASICK-EXACT\<S​:1/8 W​:2 L​:1/6 C​:7/7>[&\326] minlen 6 r->extflags​: USE_INTUIT_NOML USE_INTUIT_ML Matching REx "(?​:%326|Ö)schel" against "%326schel" UTF-8 string...   Setting an EVAL scope\, savestack=3 regmatch start   0 \<> \<%326schel> | 1​:TRIE-EXACT\<S​:1/8 W​:2 L​:1/6 C​:7/7>[&\326](9)   failed to match trie start class... Match failed Warning​: something's wrong at /tmp/rx.pl line 6. Freeing REx​: "(?​:%326|Ö)schel"

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Friday 21 September 2007 16​:55​:59 slaven@​rezic.de wrote​:

Moin\, [snip] I don't see "use utf8;" in your example\, so\, in what encoding is the script?

If "use utf8" is missing\, then the script encoding is usually iso-8859-1.

Ah. (doh!)

Likewise\, that means\, in what encoding is the $string and in what is the regexp?

The "Ö" is in both cases the byte 0xd6.

Ah. But your email says​:

Return-Path​:   \perl5\-porters\-return\-128928\-nospam\-abuse=bloodgate\.com@&#8203;perl\.org [snip] MIME-Version​: 1.0 Content-Type​: text/plain;   charset="utf-8" Content-Transfer-Encoding​: 8bit X-RT-Original-Encoding​: utf-8

Note the mail encodings. If one just copy&pastes your example\, one ends up with a UTF-8 encoded script :)

All the best\,

Tels

-- Signed on Fri Sep 21 17​:16​:24 2007 with key 0x93B84C15. Get one of my photo posters​: http​://bloodgate.com/posters PGP key on http​://bloodgate.com/tels.asc or per email.

"A Thaum is the basic unit of magical strength. It has been universally established as the amount of magic needed to create one small white pigeon or three normal-sized billiard balls."

  -- Terry Pratchett

p5pRT commented 16 years ago

From @eserte

Moin\,

On Friday 21 September 2007 16​:55​:59 slaven@​rezic.de wrote​:

Moin\, [snip] I don't see "use utf8;" in your example\, so\, in what encoding is the script?

If "use utf8" is missing\, then the script encoding is usually iso-8859-1.

Ah. (doh!)

Likewise\, that means\, in what encoding is the $string and in what is the regexp?

The "Ö" is in both cases the byte 0xd6.

Ah. But your email says​:

Return-Path​: \perl5\-porters\-return\-128928\-nospam\-abuse=bloodgate\.com@&#8203;perl\.org [snip] MIME-Version​: 1.0 Content-Type​: text/plain; charset="utf-8" Content-Transfer-Encoding​: 8bit X-RT-Original-Encoding​: utf-8

Note the mail encodings. If one just copy&pastes your example\, one ends up with a UTF-8 encoded script :)

The encoding was added by rt. I copied the script into vi opened by perlbug and hoped that the result would be reasonable.

For less confusion\, here's the script *attached*\, including the debug output proposed by Yves.

Regards\,   Slaven

p5pRT commented 16 years ago

From @eserte

rx.pl

p5pRT commented 16 years ago

From @demerphq

On 9/21/07\, slaven@​rezic.de \slaven@&#8203;rezic\.de wrote​:

On 9/20/07\, via RT srezic @​ cpan. org \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I dont have a blead handy right now to test with\, could someone please send me the output of this with a

use re Debug=>'ALL';

right before the warn statement.

See the attachment.

Thanks to you and Merijn I can say with pretty good certainty what the problem is.

The trie code builds a char class during its construction phase\, and is not storing the first byte of the unicode representation of codepoints between 128 and 255.

The fix should be fairly straight forward but I dont have access to the tools to do it myself just at the second.

But we need to make sure this is fixed before 5.10 is released.

Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From @demerphq

On 9/21/07\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 9/21/07\, slaven@​rezic.de \slaven@&#8203;rezic\.de wrote​:

On 9/20/07\, via RT srezic @​ cpan. org \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by srezic@​cpan.org # Please include the string​: [perl #45605] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=45605 >

This is a bug report for perl from srezic@​cpan.org\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- The script below works as expected until perl 5.8.8 (i.e. it prints "1"). With perl5.10.0 the pattern does not match anymore.

Regards\, Slaven

#!perl $string = 'Öschel'; utf8​::upgrade($string); warn $string =~ m{(?​:Ö|Ö)schel}; __END__

I dont have a blead handy right now to test with\, could someone please send me the output of this with a

use re Debug=>'ALL';

right before the warn statement.

See the attachment.

Thanks to you and Merijn I can say with pretty good certainty what the problem is.

The trie code builds a char class during its construction phase\, and is not storing the first byte of the unicode representation of codepoints between 128 and 255.

The fix should be fairly straight forward but I dont have access to the tools to do it myself just at the second.

But we need to make sure this is fixed before 5.10 is released.

Just to expand on this\, somewhere in or around the make_trie code is some logic that turns on a bit in a bit vector for every start byte in the trie. In the branch for handling non unicode data it needs to do something like the following pseudo code.

/* store first byte of utf8 representation of codepoints in the 127 \< cp \< 256 range */ if (127 \< cp && cp \< 192) {   SETBIT(CHARCLASS\,194) } else if (191 \< cp && cp \< 256) {   SETBIT(CHARCLASS\,195) }

Anyway\, if somebody feels like figuring out where this code would go (and adjusting it correctly\, once you find where it is it will be obvious how to correct it) then it would be cool (Im pretty sure it will be in a utility macro defined just before the routine). Otherwise this will have to wait until my desktops are unpacked and set up. (I just moved apartment and havent finished unpacking yet)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Friday 21 September 2007 23​:56​:56 demerphq wrote​:

On 9/21/07\, demerphq \demerphq@&#8203;gmail\.com wrote​:

But we need to make sure this is fixed before 5.10 is released.

Just to expand on this\, somewhere in or around the make_trie code is some logic that turns on a bit in a bit vector for every start byte in the trie. In the branch for handling non unicode data it needs to do something like the following pseudo code.

/* store first byte of utf8 representation of codepoints in the 127 \< cp \< 256 range */ if (127 \< cp && cp \< 192) { SETBIT(CHARCLASS\,194) } else if (191 \< cp && cp \< 256) { SETBIT(CHARCLASS\,195) }

Neither SETBIT nor "vector" appear in the source. In the end greppign for "bitfield" leads to line 1392 which looks like​:

  if ( set_bit ) /* bitmap only alloced when !(UTF&&Folding) */   TRIE_BITMAP_SET(trie\,*uc); /* store the raw first byte   regardless of encoding */

  for ( ; uc \< e ; uc += len ) {   TRIE_CHARCOUNT(trie)++;   TRIE_READ_CHAR;   chars++;   if ( uvc \< 256 ) {   if ( !trie->charmap[ uvc ] ) {   trie->charmap[ uvc ]=( ++trie->uniquecharcount );   if ( folder )   trie->charmap[ folder[ uvc ] ] = trie->charmap[ uvc ];   TRIE_STORE_REVCHAR;   }   if ( set_bit ) {   /* store the codepoint in the bitmap\, and if its ascii   also store its folded equivelent. */   TRIE_BITMAP_SET(trie\,uvc);   if ( folder ) TRIE_BITMAP_SET(trie\,folder[ uvc ]);   set_bit = 0; /* We've done our bit :-) */   }   } else {   SV** svpp;   if ( !widecharmap )   widecharmap = newHV();

  svpp = hv_fetch( widecharmap\, (char*)&uvc\, sizeof( UV )\, 1 );

  if ( !svpp )   Perl_croak( aTHX_ "error creating/fetching widecharmap entry for 0x%"UVXf\, uvc );

  if ( !SvTRUE( *svpp ) ) {   sv_setiv( *svpp\, ++trie->uniquecharcount );   TRIE_STORE_REVCHAR;   }   }

and I believe in the first branch the modification needs to be done. However\, I am not sure what to insert where.

All the best\,

Tels

-- Signed on Sat Sep 22 09​:58​:36 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"I am soo clumsy today." *crash*

p5pRT commented 16 years ago

From @demerphq

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Friday 21 September 2007 23​:56​:56 demerphq wrote​:

On 9/21/07\, demerphq \demerphq@&#8203;gmail\.com wrote​:

But we need to make sure this is fixed before 5.10 is released.

Just to expand on this\, somewhere in or around the make_trie code is some logic that turns on a bit in a bit vector for every start byte in the trie. In the branch for handling non unicode data it needs to do something like the following pseudo code.

/* store first byte of utf8 representation of codepoints in the 127 \< cp \< 256 range */ if (127 \< cp && cp \< 192) { SETBIT(CHARCLASS\,194) } else if (191 \< cp && cp \< 256) { SETBIT(CHARCLASS\,195) }

Neither SETBIT nor "vector" appear in the source. In the end greppign for "bitfield" leads to line 1392 which looks like​:

    if \( set\_bit \) /\* bitmap only alloced when \!\(UTF&&Folding\) \*/
        TRIE\_BITMAP\_SET\(trie\,\*uc\); /\* store the raw first byte
                                      regardless of encoding \*/

    for \( ; uc \< e ; uc \+= len \) \{
        TRIE\_CHARCOUNT\(trie\)\+\+;
        TRIE\_READ\_CHAR;
       chars\+\+;
        if \( uvc \< 256 \) \{
            if \( \!trie\->charmap\[ uvc \] \) \{
                trie\->charmap\[ uvc \]=\( \+\+trie\->uniquecharcount \);
                if \( folder \)
                    trie\->charmap\[ folder\[ uvc \] \] = trie\->charmap\[

uvc ]; TRIE_STORE_REVCHAR; } if ( set_bit ) { /* store the codepoint in the bitmap\, and if its ascii also store its folded equivelent. */ TRIE_BITMAP_SET(trie\,uvc); if ( folder ) TRIE_BITMAP_SET(trie\,folder[ uvc ]);

Right there. The line that says

  if ( folder ) TRIE_BITMAP_SET(trie\,folder[ uvc ]);

should probably read

  if ( folder ) { /* folder only true when pattern is not utf8 */   TRIE_BITMAP_SET(trie\,folder[ uvc ]); /* store the folded codepoint */   /* store first byte of utf8 representation of   codepoints in the 127 \< uvc \< 256 range */   if (127 \< uvc && uvc \< 192) {   TRIE_BITMAP_SET(trie\,194)   } else if (191 \< uvc ) { /* && uvc \< 256 -- we know uvc is \< 256 already */   TRIE_BITMAP_SET(trie\,195)   }   }

                set\_bit = 0; /\* We've done our bit :\-\) \*/
            \}
        \} else \{
            SV\*\* svpp;
            if \( \!widecharmap \)
                widecharmap = newHV\(\);

            svpp = hv\_fetch\( widecharmap\, \(char\*\)&uvc\, sizeof\( UV \)\,

1 );

            if \( \!svpp \)
                Perl\_croak\( aTHX\_ "error creating/fetching widecharmap

entry for 0x%"UVXf\, uvc );

            if \( \!SvTRUE\( \*svpp \) \) \{
                sv\_setiv\( \*svpp\, \+\+trie\->uniquecharcount \);
                TRIE\_STORE\_REVCHAR;
            \}
        \}

and I believe in the first branch the modification needs to be done. However\, I am not sure what to insert where.

Thanks a lot for digging that out\, its exactly what i needed to see.

Can you try the code as Ive indicated above and let me know if it solves the problem?

Cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 11​:50​:37 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​: [snip]

Neither SETBIT nor "vector" appear in the source. In the end greppign for "bitfield" leads to line 1392 which looks like​:

I meant "bitmap" and it is in regcomp.c - just for the record :)

                if \( folder \) TRIE\_BITMAP\_SET\(trie\,folder\[ uvc \]\);

Right there. The line that says

                   if \( folder \) TRIE\_BITMAP\_SET\(trie\,folder\[ uvc \]\);

should probably read

                   if \( folder \) \{ /\* folder only true when

pattern is not utf8 */ TRIE_BITMAP_SET(trie\,folder[ uvc ]); /* store the folded codepoint */ /* store first byte of utf8 representation of codepoints in the 127 \< uvc \< 256 range */ if (127 \< uvc && uvc \< 192) { TRIE_BITMAP_SET(trie\,194) } else if (191 \< uvc ) { /* && uvc \< 256 -- we know uvc is \< 256 already */ TRIE_BITMAP_SET(trie\,195) } }

I had to do this in a VMWare player\, and this involved shuffling your text via my server. And then make complained about stray "194" characters etc\, so I ended up completely retyping the new code\, anyway. Ugh\, aren't encodings fun? :-)

make && make test still runs\, I will report in probably 25 mins how it goes.

All the best\,

Tels

-- Signed on Sat Sep 22 12​:23​:59 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"We're confident that DNF will be one of the greatest\, if not the greatest\, game of 1998. And this confidence is not misplaced."

  -- Scott Miller\, 1997 (http​://tinyurl.com/6m8nh)

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 12​:25​:52 Tels wrote​:

Moin\, [snip] make && make test still runs\, I will report in probably 25 mins how it goes.

Attached is the patch I used. Unfortunately\, it doesn't seem to work\, as you can see from the output also attached :(

The "Warning​: something's wrong..." isn't very usefull\, either.

All the best\,

Tels

-- Signed on Sat Sep 22 13​:00​:52 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"In computer science\, we stand on each other's feet."

  -- Brian K. Reid

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Inline Patch ```diff diff -ruN blead/regcomp.c blead_trie/regcomp.c --- blead/regcomp.c 2007-09-15 00:02:23.000000000 +0200 +++ blead_trie/regcomp.c 2007-09-22 12:08:52.000000000 +0200 @@ -1405,7 +1405,19 @@ /* store the codepoint in the bitmap, and if its ascii also store its folded equivelent. */ TRIE_BITMAP_SET(trie,uvc); - if ( folder ) TRIE_BITMAP_SET(trie,folder[ uvc ]); + + if ( folder ) { /* folder only true when pattern is not utf8 */ + /* store the folded codepoint */ + TRIE_BITMAP_SET(trie,folder[ uvc ]); + /* store first byte of utf8 representation of + codepoints in the 127 < uvc < 256 range */ + if (127 < uvc && uvc < 192) { + TRIE_BITMAP_SET(trie,194); + } else if (191 < uvc ) { + TRIE_BITMAP_SET(trie,195); + /* && uvc < 256 -- we know uvc is < 256 already */ + } + } set_bit = 0; /* We've done our bit :-) */ } } else { ```
p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

guest@​localhost​:\~/src/blead_trie.test> ./perl -Ilib ../utf8.txt 5.010000 1 at ../utf8.txt line 5. guest@​localhost​:\~/src/blead_trie.test> ./perl -Ilib ../iso88591.txt 5.010000 Warning​: something's wrong at ../iso88591.txt line 5. guest@​localhost​:\~/src/blead_trie.test> guest@​localhost​:\~/src/blead_trie.test> perl ../utf8.txt 5.008007 1 at ../utf8.txt line 5. guest@​localhost​:\~/src/blead_trie.test> perl ../iso88591.txt 5.008007 1 at ../iso88591.txt line 5. guest@​localhost​:\~/src/blead_trie.test

p5pRT commented 16 years ago

From @demerphq

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 12​:25​:52 Tels wrote​:

Moin\, [snip] make && make test still runs\, I will report in probably 25 mins how it goes.

Attached is the patch I used. Unfortunately\, it doesn't seem to work\, as you can see from the output also attached :(

Dang. I guess itll have to wait until i have the time and circumstances to look into this further.

I am much obliged for the assistance Tels\, thank you.

cheers\, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 13​:19​:48 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 12​:25​:52 Tels wrote​:

Moin\,

[snip]

make && make test still runs\, I will report in probably 25 mins how it goes.

Attached is the patch I used. Unfortunately\, it doesn't seem to work\, as you can see from the output also attached :(

Dang. I guess itll have to wait until i have the time and circumstances to look into this further.

Do you have any hints on what the "something's wrong" warning means or where it comes from?

I might have a try with "DEBUG => ALL" and see what I can glean from it. In any way\, Remote Debugging Via Proxy[tm] isn't easy :-)

All the best\,

Tels

-- Signed on Sat Sep 22 13​:36​:55 2007 with key 0x93B84C15. Get one of my photo posters​: http​://bloodgate.com/posters PGP key on http​://bloodgate.com/tels.asc or per email.

"Für eine solche Bitratenreduktion muss ich den Transcoder so umkonfigurieren\, dass er größere Quantisierungskoeffizienten für die MPEG-Matrizen verwendet\, Captain" - "An die Arbeit\, Mr. LaForge."

  -- Jens Baumeister in http​://tinyurl.com/oomb

p5pRT commented 16 years ago

From @demerphq

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 13​:19​:48 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 12​:25​:52 Tels wrote​:

Moin\,

[snip]

make && make test still runs\, I will report in probably 25 mins how it goes.

Attached is the patch I used. Unfortunately\, it doesn't seem to work\, as you can see from the output also attached :(

Dang. I guess itll have to wait until i have the time and circumstances to look into this further.

Do you have any hints on what the "something's wrong" warning means or where it comes from?

Its the normal warning produced by 'warn' when called with no arguments.

d​:\sync-clone>perl -e"warn" Warning​: something's wrong at -e line 1.

I might have a try with "DEBUG => ALL" and see what I can glean from it. In any way\, Remote Debugging Via Proxy[tm] isn't easy :-)

Hmm. Actually i just realized that I was being dumb\, folder is only true when !UTF and we are doing a case insensitive match. Change the patch as follows and I think it should work. (IOW instead of replace the if (folder) line\, insert after it the new logic with the right test)​:

  /* store the codepoint in the bitmap\, and if its ascii   also store its folded equivelent. */   TRIE_BITMAP_SET(trie\,uvc);   if ( folder ) TRIE_BITMAP_SET(trie\,folder[ uvc ]); + + if ( !UTF ) { + /* store first byte of utf8 representation of + codepoints in the 127 \< uvc \< 256 range */ + if (127 \< uvc && uvc \< 192) { + TRIE_BITMAP_SET(trie\,194); + } else if (191 \< uvc ) { + TRIE_BITMAP_SET(trie\,195); + /* && uvc \< 256 -- we know uvc is \< 256 already */ + } + }   set_bit = 0; /* We've done our bit :-) */   }   } else {

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From @demerphq

On 9/22/07\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 13​:19​:48 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 12​:25​:52 Tels wrote​:

Moin\,

[snip]

make && make test still runs\, I will report in probably 25 mins how it goes.

I meant to say earlier that the win32/Makefile has some targets defined that make it easier to make minor changes to the regex engine. If you have a look at the 'reonly' and 'test-reonly' you should be able to put them in your *nix Makefile and use them. Then instead of doing a full test right off you can do a

make test-reonly

and make sure that all regex related tests pass\, and only do a full make test cycle once everything is working ok. Ive been muttering about getting these targets added to the "normal" makefile for a while but ive not got around to it yet and nobody else has either.

Also\, the test file that needs to be updated for this bug is t/op/pat.t\, and note that the test count is at the BOTTOM of file\, the new test should go about a page above the bottom (tests that have caused SEGV's in the past are kept last).

Cheers and thanks a lot for the help.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 13​:48​:48 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\, [snip] Do you have any hints on what the "something's wrong" warning means or where it comes from?

Its the normal warning produced by 'warn' when called with no arguments.

I think it would be useful to change this to "warn() called without arguments at line X".

d​:\sync-clone>perl -e"warn" Warning​: something's wrong at -e line 1.

I might have a try with "DEBUG => ALL" and see what I can glean from it. In any way\, Remote Debugging Via Proxy[tm] isn't easy :-)

Hmm. Actually i just realized that I was being dumb\, folder is only true when !UTF and we are doing a case insensitive match. Change the patch as follows and I think it should work. (IOW instead of replace the if (folder) line\, insert after it the new logic with the right test)​:

Heh\, I hadn't even powered up the VMware session when you responded :)

However\, it doesn't work. now miniperl crashes with a segmentation fault after a full​:

  make realclean; rm Config.sh; rm Policy.sh; ./Configure -des && make

Hohum. Ok\, nevermind\, I screwed it up. Lets try again​:

make currently running. Gimme 25 minutes or so. (I knew I should have installed the dual-core CPU by now...it's gathering dust on my shelf...)

All the best\,

Tels

-- Signed on Sat Sep 22 13​:57​:08 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"One man in a thousand is a leader of men\, the other 999 follow women"

  -- Groucho Marx

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 13​:57​:05 demerphq wrote​:

On 9/22/07\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 13​:19​:48 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 12​:25​:52 Tels wrote​:

Moin\,

[snip]

make && make test still runs\, I will report in probably 25 mins how it goes.

I meant to say earlier that the win32/Makefile has some targets defined that make it easier to make minor changes to the regex engine. If you have a look at the 'reonly' and 'test-reonly' you should be able to put them in your *nix Makefile and use them. Then instead of doing a full test right off you can do a

make test-reonly

and make sure that all regex related tests pass\, and only do a full make test cycle once everything is working ok. Ive been muttering about getting these targets added to the "normal" makefile for a while but ive not got around to it yet and nobody else has either.

Ah\, I might look into this.

Also\, the test file that needs to be updated for this bug is t/op/pat.t\, and note that the test count is at the BOTTOM of file\, the new test should go about a page above the bottom (tests that have caused SEGV's in the past are kept last).

And this\, too. After it compiled fully and works\, of course :-P

All the best\,

Tels

-- Signed on Sat Sep 22 14​:08​:26 2007 with key 0x93B84C15. Get one of my photo posters​: http​://bloodgate.com/posters PGP key on http​://bloodgate.com/tels.asc or per email.

"Remember​: If the game let's you do it\, it's not cheating."

  -- Xarax

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 14​:09​:11 Tels wrote​:

Moin\,

I meant to say earlier that the win32/Makefile has some targets defined that make it easier to make minor changes to the regex engine. If you have a look at the 'reonly' and 'test-reonly' you should be able to put them in your *nix Makefile and use them. Then instead of doing a full test right off you can do a

make test-reonly

and make sure that all regex related tests pass\, and only do a full make test cycle once everything is working ok. Ive been muttering about getting these targets added to the "normal" makefile for a while but ive not got around to it yet and nobody else has either.

Ah\, I might look into this.

Sorry\, developing a headache so skipping this for now.

Also\, the test file that needs to be updated for this bug is t/op/pat.t\, and note that the test count is at the BOTTOM of file\, the new test should go about a page above the bottom (tests that have caused SEGV's in the past are kept last).

And this\, too. After it compiled fully and works\, of course :-P

Attached is a patch that does what Yves suggested\, passes the test from the bug report\, as well the test I added at t/op/pat.t - however\, I am *not* sure that the test I added really tests what it should - as the file t/op/pat.t is in UTF-8 according to file so I had to use \xd6 and I just hope thats ok :)

In any event\, that should resolve this issue.

All the best\,

Tels

-- Signed on Sat Sep 22 14​:25​:19 2007 with key 0x93B84C15. Get one of my photo posters​: http​://bloodgate.com/posters PGP key on http​://bloodgate.com/tels.asc or per email.

Miko​: "Detect Evil!" Belkar\, holding up check-warding sheet of lead​: "Too slow\, sister."

  -- The Order of The Stick

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Inline Patch ```diff diff -ruN blead/regcomp.c blead_trie/regcomp.c --- blead/regcomp.c 2007-09-15 00:02:23.000000000 +0200 +++ blead_trie/regcomp.c 2007-09-22 13:59:02.000000000 +0200 @@ -1405,7 +1405,20 @@ /* store the codepoint in the bitmap, and if its ascii also store its folded equivelent. */ TRIE_BITMAP_SET(trie,uvc); - if ( folder ) TRIE_BITMAP_SET(trie,folder[ uvc ]); + + /* store the folded codepoint */ + if ( folder ) TRIE_BITMAP_SET(trie,folder[ uvc ]); + + if ( !UTF ) { + /* store first byte of utf8 representation of + codepoints in the 127 < uvc < 256 range */ + if (127 < uvc && uvc < 192) { + TRIE_BITMAP_SET(trie,194); + } else if (191 < uvc ) { + TRIE_BITMAP_SET(trie,195); + /* && uvc < 256 -- we know uvc is < 256 already */ + } + } set_bit = 0; /* We've done our bit :-) */ } } else { diff -ruN blead/t/op/pat.t blead_trie/t/op/pat.t --- blead/t/op/pat.t 2007-09-15 00:02:23.000000000 +0200 +++ blead_trie/t/op/pat.t 2007-09-22 14:08:42.000000000 +0200 @@ -4478,6 +4478,14 @@ } iseq(length($str),"0","Trie scope error, string should be empty"); } +{ +# [perl #45605] Regexp failure with utf8-flagged and byte-flagged string + + my $utf_8 = "\xd6schel"; + utf8::upgrade($utf_8); + $utf_8 =~ m{(\xd6|Ö)schel}; + iseq($1,"\xd6","#45605"); +} # Test counter is at bottom of file. Put new tests above here. #------------------------------------------------------------------- @@ -4537,6 +4545,6 @@ iseq(0+$::test,$::TestCount,"Got the right number of tests!"); # Don't forget to update this! BEGIN { - $::TestCount = 1964; + $::TestCount = 1965; print "1..$::TestCount\n"; } ```
p5pRT commented 16 years ago

From @demerphq

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\,

On Saturday 22 September 2007 14​:09​:11 Tels wrote​:

Moin\,

I meant to say earlier that the win32/Makefile has some targets defined that make it easier to make minor changes to the regex engine. If you have a look at the 'reonly' and 'test-reonly' you should be able to put them in your *nix Makefile and use them. Then instead of doing a full test right off you can do a

make test-reonly

and make sure that all regex related tests pass\, and only do a full make test cycle once everything is working ok. Ive been muttering about getting these targets added to the "normal" makefile for a while but ive not got around to it yet and nobody else has either.

Ah\, I might look into this.

Sorry\, developing a headache so skipping this for now.

Sorry to hear that. Hope you feel better.

Also\, the test file that needs to be updated for this bug is t/op/pat.t\, and note that the test count is at the BOTTOM of file\, the new test should go about a page above the bottom (tests that have caused SEGV's in the past are kept last).

And this\, too. After it compiled fully and works\, of course :-P

Attached is a patch that does what Yves suggested\, passes the test from the bug report\, as well the test I added at t/op/pat.t - however\, I am *not* sure that the test I added really tests what it should - as the file t/op/pat.t is in UTF-8 according to file

Hmm\, thats a little surprising. Wouldnt have predicted that at all.

so I had to use \xd6 and I just hope thats ok :)

That looks fine to me. So does the patch.

In any event\, that should resolve this issue.

Nice one Tels. Thanks.

Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From nospam-abuse@bloodgate.com

Moin\,

On Saturday 22 September 2007 14​:36​:56 demerphq wrote​:

On 9/22/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Moin\, [snip]

Ah\, I might look into this.

Sorry\, developing a headache so skipping this for now.

Sorry to hear that. Hope you feel better.

What a short walk in the sun\, two pieces of cake and two large mugs of coffee latte can fix is amazing :)

Althought I will still skip munging the makefiles.

Also\, the test file that needs to be updated for this bug is t/op/pat.t\, and note that the test count is at the BOTTOM of file\, the new test should go about a page above the bottom (tests that have caused SEGV's in the past are kept last).

And this\, too. After it compiled fully and works\, of course :-P

Attached is a patch that does what Yves suggested\, passes the test from the bug report\, as well the test I added at t/op/pat.t - however\, I am *not* sure that the test I added really tests what it should - as the file t/op/pat.t is in UTF-8 according to file

Hmm\, thats a little surprising. Wouldnt have predicted that at all.

It might have been because I edited on my utf-8 system. Or it might be that file just reports "utf-8" on an utf-8 system\, if the file contains only ASCII chars. "file" isn't really reliable on that I think.

so I had to use \xd6 and I just hope thats ok :) That looks fine to me. So does the patch.

Cool :)

All the best\,

Tels

-- Signed on Sat Sep 22 15​:32​:59 2007 with key 0x93B84C15. Get one of my photo posters​: http​://bloodgate.com/posters PGP key on http​://bloodgate.com/tels.asc or per email.

"I know what I don't know\, and to this day I don't know technology and I don't know accounting and finance."

  -- Bernie Ebbers\, former WorldCom Inc. CEO\, speaking in his defense during the WorldCom fraud trial.

p5pRT commented 16 years ago

From @eserte

Tels \nospam\-abuse@&#8203;bloodgate\.com writes​:

Moin\,

On Saturday 22 September 2007 14​:09​:11 Tels wrote​:

Moin\,

I meant to say earlier that the win32/Makefile has some targets defined that make it easier to make minor changes to the regex engine. If you have a look at the 'reonly' and 'test-reonly' you should be able to put them in your *nix Makefile and use them. Then instead of doing a full test right off you can do a

make test-reonly

and make sure that all regex related tests pass\, and only do a full make test cycle once everything is working ok. Ive been muttering about getting these targets added to the "normal" makefile for a while but ive not got around to it yet and nobody else has either.

Ah\, I might look into this.

Sorry\, developing a headache so skipping this for now.

Also\, the test file that needs to be updated for this bug is t/op/pat.t\, and note that the test count is at the BOTTOM of file\, the new test should go about a page above the bottom (tests that have caused SEGV's in the past are kept last).

And this\, too. After it compiled fully and works\, of course :-P

Attached is a patch that does what Yves suggested\, passes the test from the bug report\, as well the test I added at t/op/pat.t - however\, I am *not* sure that the test I added really tests what it should - as the file t/op/pat.t is in UTF-8 according to file so I had to use \xd6 and I just hope thats ok :)

The test just has byte sequences which form valid utf-8. Most of the script is really latin1.

And if you want to be sure\, Dump() from Devel​::Peek is your friend.

Regards\,   Slaven

-- Slaven Rezic - slaven \ rezic \ de

  tksm - Perl/Tk program for searching and replacing in multiple files   http​://ptktools.sourceforge.net/#tksm

p5pRT commented 16 years ago

From @rgs

On 22/09/2007\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

Attached is a patch that does what Yves suggested\, passes the test from the bug report\, as well the test I added at t/op/pat.t - however\, I am *not* sure that the test I added really tests what it should - as the file t/op/pat.t is in UTF-8 according to file so I had to use \xd6 and I just hope thats ok :)

In any event\, that should resolve this issue.

Thanks\, applied as #31961.

p5pRT commented 16 years ago

@rgs - Status changed from 'open' to 'resolved'