Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.97k stars 560 forks source link

Regex replace looses chars #6996

Closed p5pRT closed 18 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#24704 (status was 'resolved')

Searchable as RT24704$

p5pRT commented 20 years ago

From perl-5.8.0@ton.iguana.be

Created by perl-5.8.0@ton.iguana.be

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

Where did the second x go ????

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.8.0: Configured by ton at Tue Nov 12 01:56:18 CET 2002. Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration: Platform: osname=linux, osvers=2.4.19, archname=i686-linux-thread-multi-64int-ld uname='linux quasar 2.4.19 #5 wed oct 2 02:34:25 cest 2002 i686 unknown ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=define usemymalloc=y, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -fomit-frame-pointer', cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -ldb -ldl -lm -lpthread -lc -lposix -lcrypt -lutil perllibs=-lnsl -ldl -lm -lpthread -lc -lposix -lcrypt -lutil libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.2.4' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.0: /usr/lib/perl5/5.8.0/i686-linux-thread-multi-64int-ld /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i686-linux-thread-multi-64int-ld /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl . Environment for perl v5.8.0: HOME=/home/ton LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/local/jre/bin:/home/oracle/product/9.0.1/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:. PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 20 years ago

From perl_dummy@bloodgate.com

-----BEGIN PGP SIGNED MESSAGE-----

Moin\,

[Please enter your report here]

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

Where did the second x go ????

Confirmed on v5.8.1 and v5.8.2.

Cheers\,

Tels

- -- Signed on Sat Dec 20 21​:55​:18 2003 with key 0x93B84C15. Visit my photo gallery at http​://bloodgate.com/photos/ PGP key on http​://bloodgate.com/tels.asc or per email.

"Call me Justin\, Justin Case."

-----BEGIN PGP SIGNATURE----- Version​: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) Comment​: When cryptography is outlawed\, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBP+S3encLPEOTuEwVAQGE7Af+PdAqt3NEBJHj3JXMYeXy6Abjs21BPNQM g0VWYWXn+8zXi5qvpLEPFP1cUOqVTVaqC4C/T4gLI60utdNdr9JUZT47nftJk9Bz 8Ct47wf5tiOSzJt4qjFKJ8bhIkA0GcDNwiEGLXkoCM1GdtyAbeGKs9KfZSyuLews 4CcPrSiViPdK7WPr/aV5CX3fF3mfQ7sIQ/hI3hU+EKQmoOlpH0AI246IwHw9MhF5 5xm8ikikGAh3nIyDt65Da6m2IKPaT0SZNEv06hr7mkDG8jO8EkdvvbqhZVZ/kwAn PXm1C8QbCotslQ5R/x7OA7EEnXmmqebmdBmpzP62SOoO6Hyqyi9rMw== =2GmU -----END PGP SIGNATURE-----

p5pRT commented 20 years ago

From BQW10602@nifty.com

----------------------------------------------------------------- [Please enter your report here]

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

Where did the second x go ????

[Please do not change anything below this line]

Instead of s/x///\, substr($_\,0\,1\,"") causes the same result. It seems to be related to an offset OK string.

perl -le '$_="x65x"; substr($_\,0\,1\,""); print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

I tested with various numbers of bytes before and after "65"\, where initial value of $_ would be "ab65z"\, "abc65xyz"\, and so on.

The offset value is set to $numOfCharsBefore65. In this case\, if $numOfCharsBefore65 >= $numOfCharsAfter65\, this problem occurs.

#!perl for my $numOfCharsBefore65 (1..5) {   for my $numOfCharsAfter65 (1..5) {   my $prefix = substr("abcdefghijklm"\, 0\, $numOfCharsBefore65);   my $suffix = substr("nopqrstuvwxyz"\, -$numOfCharsAfter65);   print "$numOfCharsBefore65\,$numOfCharsAfter65\t";

  $_ = $prefix.'65'.$suffix;   substr($_\,0\,$numOfCharsBefore65\,''); # remove $prefix   print "$_\t"; # should be '65'.$suffix   s/65/chr/e;   print "$_\t";   print unpack("H*"\, $_)\, "\n";   } } __END__

1\,1 65z A 4100 1\,2 65yz Az 417a00 1\,3 65xyz Ayz 41797a00 1\,4 65wxyz Axyz 4178797a00 1\,5 65vwxyz Awxyz 417778797a00 2\,1 65z Az 417a 2\,2 65yz A z 41007a 2\,3 65xyz Az z 417a007a 2\,4 65wxyz Ayz z 41797a007a 2\,5 65vwxyz Axyz z 4178797a007a 3\,1 65z Az 417a 3\,2 65yz Ayz 41797a 3\,3 65xyz A yz 4100797a 3\,4 65wxyz Az yz 417a00797a 3\,5 65vwxyz Ayz yz 41797a00797a 4\,1 65z Az 417a 4\,2 65yz Ayz 41797a 4\,3 65xyz Axyz 4178797a 4\,4 65wxyz A xyz 410078797a 4\,5 65vwxyz Az xyz 417a0078797a 5\,1 65z Az 417a 5\,2 65yz Ayz 41797a 5\,3 65xyz Axyz 4178797a 5\,4 65wxyz Awxyz 417778797a 5\,5 65vwxyz A wxyz 41007778797a


Regards\, SADAHIRO Tomoyuki

p5pRT commented 20 years ago

From BQW10602@nifty.com

On Sun\, 21 Dec 2003 14​:14​:53 +0900 SADAHIRO Tomoyuki \bqw10602@​nifty\.com wrote​:

I tested with various numbers of bytes before and after "65"\, where initial value of $_ would be "ab65z"\, "abc65xyz"\, and so on.

The offset value is set to $numOfCharsBefore65. In this case\, if $numOfCharsBefore65 >= $numOfCharsAfter65\, this problem occurs.

Sorry\, "if $numOfCharsBefore65 \<= $numOfCharsAfter65" is correct.   ~~~~

#!perl for my $numOfCharsBefore65 (1..5) { for my $numOfCharsAfter65 (1..5) { my $prefix = substr("abcdefghijklm"\, 0\, $numOfCharsBefore65); my $suffix = substr("nopqrstuvwxyz"\, -$numOfCharsAfter65); print "$numOfCharsBefore65\,$numOfCharsAfter65\t";

    $\_ = $prefix\.'65'\.$suffix;
    substr\($\_\,0\,$numOfCharsBefore65\,''\); \# remove $prefix
    print "$\_\\t"; \# should be '65'\.$suffix
    s/65/chr/e;
    print "$\_\\t";
    print unpack\("H\*"\, $\_\)\, "\\n";
\}

} __END__

1\,1 65z A 4100 1\,2 65yz Az 417a00 1\,3 65xyz Ayz 41797a00 1\,4 65wxyz Axyz 4178797a00 1\,5 65vwxyz Awxyz 417778797a00 2\,1 65z Az 417a 2\,2 65yz A z 41007a 2\,3 65xyz Az z 417a007a 2\,4 65wxyz Ayz z 41797a007a 2\,5 65vwxyz Axyz z 4178797a007a 3\,1 65z Az 417a 3\,2 65yz Ayz 41797a 3\,3 65xyz A yz 4100797a 3\,4 65wxyz Az yz 417a00797a 3\,5 65vwxyz Ayz yz 41797a00797a 4\,1 65z Az 417a 4\,2 65yz Ayz 41797a 4\,3 65xyz Axyz 4178797a 4\,4 65wxyz A xyz 410078797a 4\,5 65vwxyz Az xyz 417a0078797a 5\,1 65z Az 417a 5\,2 65yz Ayz 41797a 5\,3 65xyz Axyz 4178797a 5\,4 65wxyz Awxyz 417778797a 5\,5 65vwxyz A wxyz 41007778797a


Regards\, SADAHIRO Tomoyuki

p5pRT commented 20 years ago

From marty+p5p@kasei.com

The problem here is that the subst context sb_s is pointing to parts of the PV\, but the PVIV upgrade moves the text in the PV\, so the subst context sb_s is still pointing to where it used to be.

A simple (but not great) fix is to remove the OOK before we mess with the sv.

Inline Patch ```diff --- pp_hot.c.orig 2003-12-21 03:35:10.000000000 +0000 +++ pp_hot.c 2003-12-21 22:32:56.000000000 +0000 @@ -2172,6 +2172,7 @@ RETURN; } + SvOOK_off(TARG); if (CALLREGEXEC(aTHX_ rx, s, strend, orig, 0, TARG, NULL, r_flags | REXEC_CHECKED)) { ```

If you're feeling paranoid, a safer place to unOOK is much closer to the start of pp_subst\, just before we take the first pointer to the inside of the PV.

Inline Patch ```diff --- pp_hot.c.orig 2003-12-21 03:35:10.000000000 +0000 +++ pp_hot.c 2003-12-21 22:34:10.000000000 +0000 @@ -1981,6 +1981,7 @@ DIE(aTHX_ PL_no_modify); PUTBACK; + SvOOK_off(TARG); s = SvPV(TARG, len); if (!SvPOKp(TARG) || SvTYPE(TARG) == SVt_PVGV) force_on_match = 1; -- ```

Marty

p5pRT commented 20 years ago

From @nwc10

On Sun\, Dec 21\, 2003 at 10​:39​:50PM +0000\, Marty Pauley wrote​:

If you're feeling paranoid\, a safer place to unOOK is much closer to the start of pp_subst\, just before we take the first pointer to the inside of the PV.

What do I do if I'm feeling really paranoid\, and wondering if there are a whole class of bugs like these?

[taking pointers to the insides of SVs that are then moved around because of upgrading. Hack the core to create all scalars as OOK\, and see which tests fail?]

Nicholas Clark

p5pRT commented 20 years ago

From marty+p5p@kasei.com

On Sun Dec 21 22​:57​:34 2003\, Nicholas Clark wrote​:

On Sun\, Dec 21\, 2003 at 10​:39​:50PM +0000\, Marty Pauley wrote​:

If you're feeling paranoid\, a safer place to unOOK is much closer to the start of pp_subst\, just before we take the first pointer to the inside of the PV.

What do I do if I'm feeling really paranoid\, and wondering if there are a whole class of bugs like these?

I think there might be a few more like this. We could also get similar problems when we grow the sv and cause the pv to change altogether. We should get a segmentation fault in that case.

The thing that worried me the most about this one was that there were cases that hit the bug but still 'worked'​: when the offset was larger than the remainder of the string\, the move wouldn't overwrite the old part of the string.

[taking pointers to the insides of SVs that are then moved around because of upgrading. Hack the core to create all scalars as OOK\, and see which tests fail?]

That might be a good start. :-)

If we had a version of Move that filled the source with rubbish\, we might be able to find some more bugs. And if we changed sv_backoff and sv_grow to release memory when they could\, we might get some nice segmentation faults to indicate bugs.

-- Marty

p5pRT commented 20 years ago

From @nwc10

On Sun\, Dec 21\, 2003 at 11​:40​:10PM +0000\, Marty Pauley wrote​:

On Sun Dec 21 22​:57​:34 2003\, Nicholas Clark wrote​: If we had a version of Move that filled the source with rubbish\, we might be able to find some more bugs. And if we changed sv_backoff and sv_grow to release memory when they could\, we might get some nice segmentation faults to indicate bugs.

Strictly we'd want a version of Copy that filled the source with rubbish (or bit flipped it it or something)\, as Copy is for non-overlapping cases. Move would have to be more clever and selectively trash only the non-overlapping parts.

But it's a suitably evil idea. If no-one gets there first I'll try it\, once I acquire some copious free time.

Nicholas Clark

p5pRT commented 20 years ago

From @andk

If I rewrite the test to not use chr() but something that existed in perl4\, say

  $_="65x"; s/65/$_-65/e; $res1 = $_;   $_="x65x"; s/x//; s/65/$_-65/e; $res2 = $_;   print $res1 eq $res2 ? "ok\n" : "not ok res1[$res1]res2[$res2]\n";

then I see the same result in perl4 as in bleed​:

  not ok res1[0x]res2[0]

-- andreas

p5pRT commented 18 years ago

From @smpeters

[perl-5.8.0@​ton.iguana.be - Fri Dec 19 13​:16​:25 2003]​:

This is a bug report for perl from perl-5.8.0@​ton.iguana.be\, generated with the help of perlbug 1.34 running under perl v5.8.0.

----------------------------------------------------------------- [Please enter your report here]

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

Where did the second x go ????

It looks like somewhere in Dave's modifications\, this bug has been fixed.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100 ./perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

p5pRT commented 18 years ago

@smpeters - Status changed from 'open' to 'resolved'

p5pRT commented 18 years ago

From @iabyn

On Mon\, Apr 17\, 2006 at 10​:41​:27AM -0700\, Steve Peters via RT wrote​:

[perl-5.8.0@​ton.iguana.be - Fri Dec 19 13​:16​:25 2003]​:

This is a bug report for perl from perl-5.8.0@​ton.iguana.be\, generated with the help of perlbug 1.34 running under perl v5.8.0.

----------------------------------------------------------------- [Please enter your report here]

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

Where did the second x go ????

It looks like somewhere in Dave's modifications\, this bug has been fixed.

Nothing to do with me! It got fixed somewhere in one of these​:

Change 27276 on 2006/02/23 by nicholas@​profane 'Fix the copy sizes for PVFM and' Change 27275 on 2006/02/22 by nicholas@​entropy 'In Perl_sv_dup\, merge the copy ' Change 27274 on 2006/02/22 by nicholas@​profane 'Fix typos and a missing bracket' Change 27273 on 2006/02/22 by nicholas@​profane 'Avoid C\ing any strings t' Change 27272 on 2006/02/22 by rgs@​marais 'Mark some new study tests as TO' Change 27271 on 2006/02/22 by rgs@​marais 'Add a test for study() on tied ' Change 27270 on 2006/02/22 by rgs@​marais 'Turn on match string copying wh' Change 27269 on 2006/02/22 by nicholas@​entropy 'sv_dump should report the PV fo' Change 27268 on 2006/02/22 by nicholas@​entropy 'Test warnings for converting gl' Change 27267 on 2006/02/22 by nicholas@​nicholas-fangorn 'Change 24643 made the mistake o' Change 27266 on 2006/02/21 by nicholas@​entropy 'Integrate​: [ 27030] Subject​: [p' Change 27265 on 2006/02/21 by nicholas@​nicholas-saigo 'Can merge the two arms of Perl_' Change 27264 on 2006/02/21 by nicholas@​nicholas-saigo 'Reorder the SV flags so that SV' Change 27263 on 2006/02/21 by stevep@​stevep-mccoy 'Additional hints needed to buil'

-- Technology is dominated by two types of people​: those who understand what they do not manage\, and those who manage what they do not understand.

p5pRT commented 18 years ago

From @nwc10

On Mon\, Apr 17\, 2006 at 07​:23​:26PM +0100\, Dave Mitchell wrote​:

On Mon\, Apr 17\, 2006 at 10​:41​:27AM -0700\, Steve Peters via RT wrote​:

[perl-5.8.0@​ton.iguana.be - Fri Dec 19 13​:16​:25 2003]​:

This is a bug report for perl from perl-5.8.0@​ton.iguana.be\, generated with the help of perlbug 1.34 running under perl v5.8.0.

----------------------------------------------------------------- [Please enter your report here]

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x Ax 4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*"\, $_)' 65x A 4100

Where did the second x go ????

It looks like somewhere in Dave's modifications\, this bug has been fixed.

Nothing to do with me! It got fixed somewhere in one of these​:

It's this one​:

Change 27270 on 2006/02/22 by rgs@​marais 'Turn on match string copying wh'

Change 27270 by rgs@​marais on 2006/02/22 20​:28​:41

  Turn on match string copying when /e flag is set on a substitution.

  Subject​: [PATCH] dodge a valgrind error (for maint or blead)   From​: Jarkko Hietaniemi \jhietaniemi@&#8203;gmail\.com   Date​: Feb 22\, 2006 8​:16 PM   Message-ID​: \43FCB896\.7060106@&#8203;gmail\.com

Affected files ...

... //depot/perl/pp_hot.c#455 edit

Differences ...

==== //depot/perl/pp_hot.c#455 (text) ====

@​@​ -1335\,7 +1335\,7 @​@​   }   }   if ((!global && rx->nparens) - || SvTEMP(TARG) || PL_sawampersand) + || SvTEMP(TARG) || PL_sawampersand || (pm->op_pmflags & PMf_EVAL))   r_flags |= REXEC_COPY_STR;   if (SvSCREAM(TARG))   r_flags |= REXEC_SCREAM; @​@​ -2101\,7 +2101\,8 @​@​   pm = PL_curpm;   rx = PM_GETRE(pm);   } - r_flags = (rx->nparens || SvTEMP(TARG) || PL_sawampersand) + r_flags = (rx->nparens || SvTEMP(TARG) || PL_sawampersand + || (pm->op_pmflags & PMf_EVAL))   ? REXEC_COPY_STR : 0;   if (SvSCREAM(TARG))   r_flags |= REXEC_SCREAM;

I'm not sure why that fixes it. For the example code\, valgrind makes no complaint with or without the change.

Nicholas Clark