Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

Text::Wrap::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on #9266

Closed p5pRT closed 16 years ago

p5pRT commented 16 years ago

Migrated from rt.perl.org#52104 (status was 'resolved')

Searchable as RT52104$

p5pRT commented 16 years ago

From lpsolit@gmail.com

As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439, Text​::Wrap​::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on. The testcase given in the bug\, https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl script to run from the shell) shows this very clearly. Due to this bug\, all pages containing such strings are left blank\, which is a real problem for webapps such as Bugzilla.


Flags​:   category=core   severity=high


Site configuration information for perl v5.8.8​:

Configured by Mandriva at Mon Nov 5 17​:08​:57 EST 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration​:   Platform​:   osname=linux\, osvers=2.6.12-31mdksmp\, archname=i386-linux   uname='linux mercury.mandriva.com 2.6.12-31mdksmp #1 smp fri mar 2 08​:52​:38 mst 2007 i686 intel(r) pentium(r) 4 cpu 3.00ghz gnulinux '   config_args='-des -Dinc_version_list=5.8.7 5.8.7/i386-linux 5.8.6 5.8.6/i386-linux 5.8.5 5.8.4 5.8.3 5.8.2 5.8.1 5.8.0 5.6.1 5.6.0 -Darchname=i386-linux -Dcc=gcc -Doptimize=-O2 -pipe -Wp\,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4 -fexceptions -fomit-frame-pointer -march=i586 -mtune=generic -fasynchronous-unwind-tables -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dsitebin=/usr/local/bin -Dsiteman1dir=/usr/local/share/man/man1 -Dsiteman3dir=/usr/local/share/man/man3 -Dman3ext=3pm -Dcf_by=Mandriva -Dmyhostname=localhost -Dperladmin=root@​localhost -Dcf_email=root@​localhost -Dd_dosuid -Ud_csh -Duseshrplib'   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='gcc'\, ccflags ='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm'\,   optimize='-O2 -pipe -Wp\,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4 -fexceptions -fomit-frame-pointer -march=i586 -mtune=generic -fasynchronous-unwind-tables'\,   cppflags='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -I/usr/include/gdbm'   ccversion=''\, gccversion='4.2.2 20070909 (prerelease) (4.2.2-0.RC.1mdv2008.0)'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, prototype=define   Linker and Libraries​:   ld='gcc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib   libs=-lnsl -lndbm -lgdbm -ldl -lm -lcrypt -lutil -lc   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc   libc=/lib/libc-2.6.1.so\, so=so\, useshrplib=true\, libperl=libperl.so   gnulibc_version='2.6.1'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E -Wl\,-rpath\,/usr/lib/perl5/5.8.8/i386-linux/CORE'   cccdlflags='-fPIC'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:   Mandriva Linux patches


@​INC for perl v5.8.8​:   /usr/lib/perl5/5.8.8/i386-linux   /usr/lib/perl5/5.8.8   /usr/lib/perl5/site_perl/5.8.8/i386-linux   /usr/lib/perl5/site_perl/5.8.8   /usr/lib/perl5/site_perl   /usr/lib/perl5/vendor_perl/5.8.8/i386-linux   /usr/lib/perl5/vendor_perl/5.8.8   /usr/lib/perl5/vendor_perl   .


Environment for perl v5.8.8​:   HOME=/root   LANG=fr_CH.UTF-8   LANGUAGE=fr_CH.UTF-8​:fr   LC_ADDRESS=fr_CH.UTF-8   LC_COLLATE=fr_CH.UTF-8   LC_CTYPE=fr_CH.UTF-8   LC_IDENTIFICATION=fr_CH.UTF-8   LC_MEASUREMENT=fr_CH.UTF-8   LC_MESSAGES=fr_CH.UTF-8   LC_MONETARY=fr_CH.UTF-8   LC_NAME=fr_CH.UTF-8   LC_NUMERIC=fr_CH.UTF-8   LC_PAPER=fr_CH.UTF-8   LC_SOURCED=1   LC_TELEPHONE=fr_CH.UTF-8   LC_TIME=fr_CH.UTF-8   LD_LIBRARY_PATH (unset)   LOGDIR (unset)

PATH=/sbin​:/usr/sbin​:/bin​:/usr/bin​:/usr/X11R6/bin​:/usr/local/bin​:/usr/local/sbin​:/root/bin   PERL_BADLANG (unset)   SHELL=/bin/bash

p5pRT commented 16 years ago

From @nwc10

On Tue\, Mar 25\, 2008 at 05​:08​:10PM -0700\, Frdric Buclin wrote​:

As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439, Text​::Wrap​::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on. The testcase given in the bug\, https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl script to run from the shell) shows this very clearly. Due to this bug\, all pages containing such strings are left blank\, which is a real problem for webapps such as Bugzilla.

The bug seems to be caused by a regexp using pos() inside a substitution\, and can be reduced to something like this​:

$ cat 52104.pl use strict; use warnings;

$_ = chr(0x410) . "N";

s/N/ pos(); "" /e;

use Devel​::Peek; Dump ($_);

__END__ $ ./perl -Ilib 52104.pl Malformed UTF-8 character (unexpected end of string) in match position at 52104.pl line 6. Malformed UTF-8 character (unexpected end of string) in match position at 52104.pl line 6. Malformed UTF-8 character (unexpected end of string) in match position at 52104.pl line 6. SV = PVMG(0x930958) at 0x8ce940   REFCNT = 1   FLAGS = (SMG\,POK\,pPOK\,UTF8)   IV = 0   NV = 0   PV = 0x8fe668 "\320\220"\0 [UTF8 "\x{410}"]   CUR = 2   LEN = 8   MAGIC = 0x8ebd68   MG_VIRTUAL = &PL_vtbl_utf8   MG_TYPE = PERL_MAGIC_utf8(w)   MG_LEN = -1   MAGIC = 0x8ef338   MG_VIRTUAL = &PL_vtbl_mglob   MG_TYPE = PERL_MAGIC_regex_global(g)   MG_LEN = -1

It's still present in blead. It's not Cyrillic specific\, but I happened to stick to the same Cyrillic character in the test case.

I don't know what the cause is.

Nicholas Clark

p5pRT commented 16 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 16 years ago

From @nwc10

On Wed\, Mar 26\, 2008 at 12​:50​:55PM +0000\, Nicholas Clark wrote​:

On Tue\, Mar 25\, 2008 at 05​:08​:10PM -0700\, Frdric Buclin wrote​:

As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439, Text​::Wrap​::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on. The testcase given in the bug\, https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl script to run from the shell) shows this very clearly. Due to this bug\, all pages containing such strings are left blank\, which is a real problem for webapps such as Bugzilla.

The bug seems to be caused by a regexp using pos() inside a substitution\, and can be reduced to something like this​:

I don't know what the cause is.

It should all be fixed by the appended change\, which I expect will be in 5.8.9 soon (RC1 within weeks).

Nicholas Clark

Change 33580 by nicholas@​nicholas-saigo on 2008/03/26 21​:05​:20

  The offset for pos is stored as bytes\, and converted to (Unicode)   character position when read\, if needed. The code for setting pos   inside subst was incorrectly converting to character position before   storing the value. This code appears to have been buggy since it was   added in 2000 in change 7562.

Affected files ...

... //depot/perl/pp_ctl.c#688 edit ... //depot/perl/t/op/subst.t#50 edit

Differences ...

==== //depot/perl/pp_ctl.c#688 (text) ====

@​@​ -298\,7 +298\,6 @​@​   { /* Update the pos() information. */   SV * const sv = cx->sb_targ;   MAGIC *mg; - I32 i;   SvUPGRADE(sv\, SVt_PVMG);   if (!(mg = mg_find(sv\, PERL_MAGIC_regex_global))) { #ifdef PERL_OLD_COPY_ON_WRITE @​@​ -308\,10 +307\,7 @​@​   mg = sv_magicext(sv\, NULL\, PERL_MAGIC_regex_global\, &PL_vtbl_mglob\,   NULL\, 0);   } - i = m - orig; - if (DO_UTF8(sv)) - sv_pos_b2u(sv\, &i); - mg->mg_len = i; + mg->mg_len = m - orig;   }   if (old != rx)   (void)ReREFCNT_inc(rx);

==== //depot/perl/t/op/subst.t#50 (xtext) ====

@​@​ -7\,7 +7\,7 @​@​ }

require './test.pl'; -plan( tests => 136 ); +plan( tests => 139 );

$x = 'foo'; $_ = "x"; @​@​ -583\,3 +583\,11 @​@​   is($want\,$_\,"RT#17542"); }

+{ + my @​tests = ('ABC'\, "\xA3\xA4\xA5"\, "\x{410}\x{411}\x{412}"); + foreach (@​tests) { + my $id = ord $_; + s/./pos/ge; + is($_\, "012"\, "RT#52104​: $id"); + } +}

p5pRT commented 16 years ago

From lpsolit@gmail.com

For all installations which do not have the not-yet-released Perl 5.8.9\, which workaround(s) do they have? On critical installations\, it's hard to upgrade Perl as it's a pretty heavy change.

p5pRT commented 16 years ago

From @nwc10

On Thu\, Mar 27\, 2008 at 10​:41​:51PM +0100\, Frdric Buclin wrote​:

For all installations which do not have the not-yet-released Perl 5.8.9\, which workaround(s) do they have? On critical installations\, it's hard to upgrade Perl as it's a pretty heavy change.

Given that the bug means that using pos inside the replacement in s///e isn't going to work with Unicode\, it unfortunately is as simple and restrictive as "don't do that then". Which\, I guess\, in this case means not triggering the replacement in Text​::Tabs​::expand()\, which does just that.

For Bugzilla\, as it's generating HTML for web pages\, if it's not text in \

tags\, am I right in assuming that it doesn't matter in the HTML source whether
it's "\t" or " "\, as whitespace is equivalent and folded? If so\, can it be
worked round by translating all "\t" to " " before the call to Text​::Wrap?

Or is this explicitly for fixed width text\, where formatting of tabs at tabstops is an issue?

Nicholas Clark

p5pRT commented 16 years ago

From Robin.Barker@npl.co.uk

From​: Fr��d��ric Buclin [mailto​:lpsolit@​gmail.com]

For all installations which do not have the not-yet-released Perl 5.8.9\, which workaround(s) do they have? On critical installations\, it's hard to upgrade Perl as it's a pretty heavy change.

I have rewritten Text​::Tabs to avoid s//pos/ The code is at work and I haven't had a chance to test it. From memory\, the code for C\ is below.

Robin

sub expand {   my @​l;   for ( @​_ ) {   my $s = '';   for (split(/^/m\, $_\, -1)) {   my @​tabs = split(/\t/\, $_\, -1);   my $line = shift @​tabs;   for (@​tabs) {   my $pad = $tabstop - (length $line) % $tabstop;   $line .= " " x $pad;   $line .= $_;   }   $s .= $line;   }   push(@​l\, $s);   }   return @​l if wantarray;   return $l[0]; }


This e-mail and any attachments may contain confidential and/or privileged material; it is for the intended addressee(s) only. If you are not a named addressee\, you must not use\, retain or disclose such information.

NPL Management Ltd cannot guarantee that the e-mail or any attachments are free from viruses.

NPL Management Ltd. Registered in England and Wales. No​: 2937881 Registered Office​: Serco House\, 16 Bartley Wood Business Park\,   Hook\, Hampshire\, United Kingdom RG27 9UY


p5pRT commented 16 years ago

From @nwc10

On Fri\, Mar 28\, 2008 at 10​:58​:25PM -0000\, Robin Barker wrote​:

From​: Frédéric Buclin [mailto​:lpsolit@​gmail.com]

For all installations which do not have the not-yet-released Perl 5.8.9\, which workaround(s) do they have? On critical installations\, it's hard to upgrade Perl as it's a pretty heavy change.

I have rewritten Text​::Tabs to avoid s//pos/ The code is at work and I haven't had a chance to test it.

From memory\, the code for C\ is below.

Robin

sub expand { my @​l; for ( @​_ ) { my $s = ''; for (split(/^/m\, $_\, -1)) { my @​tabs = split(/\t/\, $_\, -1); my $line = shift @​tabs; for (@​tabs) { my $pad = $tabstop - (length $line) % $tabstop; $line .= " " x $pad; $line .= $_; } $s .= $line; } push(@​l\, $s); } return @​l if wantarray; return $l[0]; }

Although that's not a huge win unless the work around code goes into a release on CPAN\, which is not something that p5p controls.

Nicholas Clark

p5pRT commented 16 years ago

From @ap

* Robin Barker \Robin\.Barker@​npl\.co\.uk [2008-03-29 00​:00]​:

From​: Frédéric Buclin [mailto​:lpsolit@​gmail.com]

For all installations which do not have the not-yet-released Perl 5.8.9\, which workaround(s) do they have? On critical installations\, it's hard to upgrade Perl as it's a pretty heavy change.

I have rewritten Text​::Tabs to avoid s//pos/ The code is at work and I haven't had a chance to test it. From memory\, the code for C\ is below.

I wrote the version of `expand` with s//pos/e; it is the way it is for maximum performance. The old code ran in O(n^2) but was fast for short strings; my goal was O(n) without significantly higher overhead. I had some variants I liked better\, code quality wise\, but they were slower.

Your version is O(n) too\, but if you benchmark it you’ll find it is measurably slower than mine in all cases\, and much slower than the old Text​::Tabs code for short strings.

There have been some fixes in Text​::Wrap since my performance patch\, but none in Text​::Tabs\, so if you have a buggy version of perl I suggest you simply downgrade Text​::Tabs to the version from the Text-Tabs+Wrap-2001.0929 distribution.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 16 years ago

@smpeters - Status changed from 'open' to 'resolved'