possible memory related bug when using sprintf with an utf-8 encoded format-string and iso-8859-1 encoded string variables.

p5pRT commented 18 years ago

Migrated from rt.perl.org#39126 (status was 'resolved')

Searchable as RT39126$

p5pRT commented 18 years ago

From willem@lunatech.com

This is a bug report for perl from willem@lunatech.com\, generated with the help of perlbug 1.35 running under perl v5.8.8.

Dear maintainers\,

For a project I'm working on\, I've run into a difficult to understand issue. The code I'm working on translates edi messages that may have various encodings

In some cases we run into a perl crash when formatting a translated string\, which in the general case works normally. The error messages returned is:

*** glibc detected *** realloc(): invalid next size: 0x081fac98 *** Aborted

I've tried to pin down the root cause of the problem and managed to write two simple scripts which only a slight variation. One of them crashes as above\, while the other runs normally.

Some extra testing with valgrind of the crashing test script shows the following: Invalid write of size 1 at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl) by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl) Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

Which in my eyes looks like a buffer overrun.

I'm not sure if I can attach files with 'perlbug' and I certainly do not know how\, so you'll find the two mentioned test scripts below. The first one crashes\, the secod one not.

The first test script is:

=== sprintf-bug.pl === use strict; use warnings; use Encode;

my $format = decode("utf-8"\, encode('utf-8'\, "%5s%-10s%-35s%-35s")); my @records = (''\, ''\, "\344\345"\, "\326"); my $line = sprintf($format\, @records);

print STDOUT "$line\n"; === /sprintf-bug.pl ===

And the second test script is:

=== sprintf-nonbug.pl === use strict; use warnings; use Encode;

my $format = decode("utf-8"\, encode('utf-8'\, "%5s%-35s%-35s")); my @records = (''\, "\344\345"\, "\326"); my $line = sprintf($format\, @records);

print STDOUT "$line\n"; === /sprintf-nonbug.pl ===

I hope this provides you with enough information to identify the actual bug and write a fix for it. I know that the code may seem funny by having the format string in utf-8\, but I don't think that it should result in a crash.

In any event\, thanks for your time to look into it. If you need any assistance\, then please\, just let me know.

Kindest regards\,

Willem-Jan Veen The Netherlands

Flags: category=core severity=high

Site configuration information for perl v5.8.8:

Configured by Debian Project at Tue Apr 4 22:34:25 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux\, osvers=2.6.15.4\, archname=i486-linux-gnu-thread-multi uname='linux ninsei 2.6.15.4 #1 smp preempt mon feb 20 09:48:53 pst 2006 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des' hint=recommended\, useposix=true\, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n\, bincompat5005=undef Compiler: cc='cc'\, ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\, optimize='-O2'\, cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include' ccversion=''\, gccversion='4.0.3 (Debian 4.0.3-1)'\, gccosandvers='' intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12 ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8 alignbytes=4\, prototype=define Linker and Libraries: ld='cc'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.3.6.so\, so=so\, useshrplib=true\, libperl=libperl.so.5.8.8 gnulibc_version='2.3.6' Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E' cccdlflags='-fPIC'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.8.8: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .

Environment for perl v5.8.8: HOME=/home/willem LANG (unset) LANGUAGE (unset) LC_CTYPE=en_US LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/willem/bin:/usr/local/vmware/bin:/opt/trn/bin:/opt/sbin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/bin/X11:/usr/sbin:/sbin:/usr/games:/opt/ssl/bin:/opt/bin:/opt/xemacs/bin:/usr/local/solid/bin:.:/home/willem/.local/bin PERL_BADLANG (unset) SHELL=/bin/bash

p5pRT commented 18 years ago

From shouldbedomo@mac.com

On 2006–05–11\, at 11:58\, willem@lunatech.com (via RT) wrote:

you'll find the two mentioned test scripts below. The first one
crashes\, the secod one not. ... Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux\, osvers=2.6.15.4\, archname=i486-linux-gnu-thread- multi

FWIW\, I can't make get either of these to crash _under normal
circumstances_ on Mac OS X with any of these perls:

This is perl\, v5.9.4 DEVEL28156 built for darwin-thread-multi-2level

This is perl\, v5.8.8 built for darwin-2level (Sorry\, don't have a
threaded build to hand)

This is perl\, v5.8.6 built for darwin-thread-multi-2level

However\, if I use libgmalloc.dylib\, "an aggressive debugging malloc
library"\, I do get a crash for the first script. Here's the trace:

Host Name: Tullamore Date/Time: 2006-05-11 20:17:43.941 +0200 OS Version: 10.4.6 (Build 8I127) Report Version: 4

Command: perl Path: ./perl Parent: bash [290]

Version: ??? (???)

PID: 4525 Thread: 0

Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0xb954b000

Thread 0 Crashed: 0 perl 0x000f0298 Perl_sv_vcatpvfn + 13036 (sv.c:9415) 1 perl 0x000ecd98 Perl_sv_vsetpvfn + 108 (sv.c:8291) 2 perl 0x0003e58c Perl_do_sprintf + 344 (doop.c:719) 3 perl 0x00070224 Perl_pp_sprintf + 96 (pp.c:3305) 4 perl 0x0013df04 Perl_runops_debug + 332 (dump.c:1734) 5 perl 0x000289dc S_run_body + 524 (perl.c:2396) 6 perl 0x000284ac perl_run + 192 (perl.c:2318) 7 perl 0x00002bdc main + 232 (perlmain.c:105) 8 perl 0x0000234c _start + 340 (crt.c:272) 9 perl 0x000021f4 start + 60

Thread 0 crashed with PPC Thread State 64: srr0: 0x00000000000f0298 srr1:
0x100000000200d030 vrsave: 0x0000000000000000 cr: 0x48000404 xer: 0x0000000000000000 lr:
0x00000000000f0230 ctr: 0x0000000000000000 r0: 0x0000000000000000 r1: 0x00000000bffff270 r2:
0x0000000000000001 r3: 0x00000000b954afdf r4: 0x0000000000000021 r5: 0xffffffff46ab5021 r6:
0x00000000c3a4c3a5 r7: 0x0000000020202020 r8: 0x00000000b954afff r9: 0x00000000b954afcc r10:
0x00000000aa5660d4 r11: 0x000000000019384c r12: 0x0000000090129ea0 r13: 0x000000000016d984 r14:
0x0000000000000001 r15: 0x00000000b40291e0 r16: 0x0000000000000000 r17: 0x0000000000000000 r18:
0x0000000000000000 r19: 0x00000000b9530ffe r20: 0x0000000000000000 r21: 0x0000000000000000 r22:
0x0000000000000003 r23: 0x0000000000000000 r24: 0x00000000b0002390 r25: 0x00000000b4ad7050 r26:
0x00000000b9548ff8 r27: 0x0000000000000000 r28: 0x0000000000000021 r29: 0x00000000b954b000 r30:
0x00000000b954afdb r31: 0x00000000000ecfc8

Binary Images Description: 0x1000 - 0x191fff perl /Volumes/Tottie/Other/src/Perlsmoke/32- bit_perl-current/perl 0x8fe00000 - 0x8fe51fff dyld 44.4 /usr/lib/dyld 0x90000000 - 0x901bbfff libSystem.B.dylib /usr/lib/libSystem.B.dylib 0x90213000 - 0x90218fff libmathCommon.A.dylib /usr/lib/system/ libmathCommon.A.dylib 0x9141a000 - 0x91425fff libgcc_s.1.dylib /usr/lib/libgcc_s.1.dylib 0x9605e000 - 0x9607efff libmx.A.dylib /usr/lib/libmx.A.dylib 0x9a564000 - 0x9a566fff libgmalloc.dylib /usr/lib/libgmalloc.dylib

That's for bleadperl; others are similar.

(ibgmalloc.dylib assigns each malloc() a new memory page with a guard
page beyond it and the last byte of the allocation on the last byte
of the page (modulo alignment considerations). The corresponding free () unmaps the page. Thus monkey business results in a crash.)

The guilty code line is

sv.c:9415 *p = '\0';

but I'm afraid I can't see why p (which seems to correspond to r29)
might be pointing to unallocated memory (not that I've had more than
a cursory look).

LC\_CTYPE=en\_US

Putting LC_CTYPE=en_US in the environment (which contains no other
locale-related variables) makes no difference to the symptoms. -- Dominic Dunlop

p5pRT commented 18 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 18 years ago

From willem@lunatech.com

Domique\,

FWIW\, I can't make get either of these to crash _under normal circumstances_ on Mac OS X with any of these perls: I understand. I imagined already that it could be platform related\, though I haven't tested it on a Mac or other platform.

However\, if I use libgmalloc.dylib\, "an aggressive debugging malloc library"\, I do get a crash for the first script. Here's the trace: I'll try to look more closely to it as well. Though it has become less of a priority\, because I've implemented a workaround in the project code. I now assure that the format string is never in utf8\, which seems to be the trigger of my problems.

but I'm afraid I can't see why p (which seems to correspond to r29) might be pointing to unallocated memory (not that I've had more than a cursory look). Thanks anyway. I'll try to find some time to debug it a bit further as well. Even if it's not immediately needed for myself\, I still believe that there is something fishy going on.

Kindest regards\,

Willem Jan Veen

-- Willem Jan Krijn Veen Lunatech Research b.v.

Heemraadssingel 70 3021 DD Rotterdam The Netherlands Office tel.: (+31) (0)10 7502600 Fax.: (+31) (0)10 2439902

p5pRT commented 18 years ago

From BQW10602@nifty.com

On Thu\, 11 May 2006 02:58:48 -0700\, "willem@lunatech.com (via RT)" \perlbug\-followup@perl\.org wrote

# New Ticket Created by willem@lunatech.com # Please include the string: [perl #39126] # in the subject line of all future correspondence about this issue. # \<URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=39126 >

----------------------------------------------------------------- Dear maintainers\,

Some extra testing with valgrind of the crashing test script shows the following: Invalid write of size 1 at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl) by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl) Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

Which in my eyes looks like a buffer overrun.

I hope this provides you with enough information to identify the actual bug and write a fix for it. I know that the code may seem funny by having the format string in utf-8\, but I don't think that it should result in a crash.

I think the following is more reproducible in perl-current:

for my $i (100) { my $format = "%-". ($i * 7). "s"; utf8::upgrade($format); my @records = ("\344\345" x $i); my $line = sprintf($format\, @records); }

The problem occurs near the end of Perl_sv_vcatpvfn:

STRLEN width = 7*$i STRLEN have = 2*$i STRLEN need = 7*$i STRLEN gap = 5*$i = need - have STRLEN elen = 4*$i (after sv_utf8_upgrade) SvCUR(sv) at last = gap + elen = need + (elen - have) = 9*$i SvLEN(sv) at last = PERL_STRLEN_ROUNDUP(need + dotstrlen + 1) = PERL_STRLEN_ROUNDUP(7*$i+2)

Oops! elen gets doubled by sv_utf8_upgrade() but need on SvGROW(sv\, SvCUR(sv) + need + dotstrlen + 1) doesn't care it!

#sv.c 9341-9367 /* calculate width before utf8_upgrade changes it */ have = esignlen + zeros + elen; if (have \< zeros) Perl_croak_nocontext(PL_memory_wrap);

if (is_utf8 != has_utf8) { if (is_utf8) { if (SvCUR(sv)) sv_utf8_upgrade(sv); } else { SV * const nsv = sv_2mortal(newSVpvn(eptr\, elen)); sv_utf8_upgrade(nsv); eptr = SvPVX_const(nsv); elen = SvCUR(nsv); } SvGROW(sv\, SvCUR(sv) + elen + 1); p = SvEND(sv); *p = '\0'; }

need = (have > width ? have : width); gap = need - have;

if (need >= (((STRLEN)~0) - SvCUR(sv) - dotstrlen - 1)) Perl_croak_nocontext(PL_memory_wrap); SvGROW(sv\, SvCUR(sv) + need + dotstrlen + 1);

SvGROW(sv\, SvCUR(sv) + elen + 1) doesn't help it since SvGROW(sv\, SvCUR(sv) + need + dotstrlen + 1) makes too short. To fix this\, perhaps what are in bytes and what are in characters should be rethought.

Regards\, SADAHIRO Tomoyuki

p5pRT commented 18 years ago

From BQW10602@nifty.com

On Fri\, 19 May 2006 01:29:36 +0900\, SADAHIRO Tomoyuki \bqw10602@nifty\.com wrote

On Thu\, 11 May 2006 02:58:48 -0700\, "willem@lunatech.com (via RT)" \perlbug\-followup@perl\.org wrote

Some extra testing with valgrind of the crashing test script shows the following: Invalid write of size 1 at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl) by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl) Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

Which in my eyes looks like a buffer overrun.

The problem occurs near the end of Perl_sv_vcatpvfn:

Here is a patch; the test suite in t/op/sprintf2.t used \xe4 but in EBCDIC this byte is U then must have no change on utf8_upgraded. \xb4 is upgraded into two octets even ifdef EBCDIC\, hence it's appropriate for the test.

Regards\, SADAHIRO Tomoyuki

Inline Patch

```diff diff -urN perl-current@28232/sv.c perl/sv.c --- perl-current@28232/sv.c Thu May 18 05:54:33 2006 +++ perl/sv.c Sun May 21 18:24:45 2006 @@ -9338,26 +9338,28 @@ continue; /* not "break" */ } - /* calculate width before utf8_upgrade changes it */ + if (is_utf8 != has_utf8) { + if (is_utf8) { + if (SvCUR(sv)) + sv_utf8_upgrade(sv); + } + else { + const STRLEN old_elen = elen; + SV * const nsv = sv_2mortal(newSVpvn(eptr, elen)); + sv_utf8_upgrade(nsv); + eptr = SvPVX_const(nsv); + elen = SvCUR(nsv); + + if (width) { /* fudge width (can't fudge elen) */ + width += elen - old_elen; + } + is_utf8 = TRUE; + } + } + have = esignlen + zeros + elen; if (have < zeros) Perl_croak_nocontext(PL_memory_wrap); - - if (is_utf8 != has_utf8) { - if (is_utf8) { - if (SvCUR(sv)) - sv_utf8_upgrade(sv); - } - else { - SV * const nsv = sv_2mortal(newSVpvn(eptr, elen)); - sv_utf8_upgrade(nsv); - eptr = SvPVX_const(nsv); - elen = SvCUR(nsv); - } - SvGROW(sv, SvCUR(sv) + elen + 1); - p = SvEND(sv); - *p = '\0'; - } need = (have > width ? have : width); gap = need - have; diff -urN perl-current@28232/t/op/sprintf2.t perl/t/op/sprintf2.t --- perl-current@28232/t/op/sprintf2.t Tue Dec 13 22:58:10 2005 +++ perl/t/op/sprintf2.t Sun May 21 18:34:34 2006 @@ -6,7 +6,7 @@ require './test.pl'; } -plan tests => 275; +plan tests => 280; is( sprintf("%.40g ",0.01), @@ -18,13 +18,14 @@ sprintf("%.40f", 0.01)." ", q(the sprintf "%.f" optimization) ); -{ - chop(my $utf8_format = "%-3s\x{100}"); - is( - sprintf($utf8_format, "\xe4"), - "\xe4 ", - q(width calculation under utf8 upgrade) - ); + +# cases of $i > 1 are against [perl #39126] +for my $i (1, 5, 10, 20, 50, 100) { + chop(my $utf8_format = "%-". 3*$i ."s\x{100}"); + my $string = "\xB4"x$i; # latin1 ACUTE or ebcdic COPYRIGHT + my $expect = $string." "x$i; # followed by 2*$i spaces + is(sprintf($utf8_format, $string), $expect, + "width calculation under utf8 upgrade, length=$i"); } # Used to mangle PL_sv_undef End of patch ```

p5pRT commented 18 years ago

From BQW10602@nifty.com

Here is a tweak of the test suite. The width can be replaced with an asterisk.

diff -urN perl-current@28232/t/op/sprintf2.t perl/t/op/sprintf2.t --- perl-current@28232/t/op/sprintf2.t Tue Dec 13 22:58:10 2005 +++ perl/t/op/sprintf2.t Sun May 21 18:34:34 2006

+# cases of $i > 1 are against [perl #39126] +for my $i (1\, 5\, 10\, 20\, 50\, 100) { + chop(my $utf8_format = "%-". 3*$i ."s\x{100}");

+ chop(my $utf8_format = "%-*s\x{100}");

+ my $string = "\xB4"x$i; # latin1 ACUTE or ebcdic COPYRIGHT + my $expect = $string." "x$i; # followed by 2*$i spaces + is(sprintf($utf8_format\, $string)\, $expect\,

+ is(sprintf($utf8_format\, 3*$i\, $string)\, $expect\,

+ "width calculation under utf8 upgrade\, length=$i");

Regards\, SADAHIRO Tomoyuki

p5pRT commented 18 years ago

From BQW10602@nifty.com

# New Ticket Created by willem@lunatech.com # Please include the string: [perl #39126] # in the subject line of all future correspondence about this issue. # \<URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=39126 >

Some extra testing with valgrind of the crashing test script shows the following: Invalid write of size 1 at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl) by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl) Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

The problem occurs near the end of Perl_sv_vcatpvfn:

To fix this\, perhaps what are in bytes and what are in characters should be rethought.

Currently the width and the precision for %s are in characters; for example: sprintf("%03s"\, "\x{100}") returns "00\x{100}". sprintf("%.3s"\, "\x{abcd}12345") returns "\x{abcd}12".

But the width and the precision for %c and in bytes; for example: sprintf("%03c"\, 0x100) returns "0\x{100}". sprintf("%.2c"\, 0xabcd) returns "\352\257" (malformed UTF8).

Then %n sets the number of characters output in bytes; for example: after sprintf("%s%n"\, "\x{beef}"\, $a)\, $a is set to 3.

Regards\, SADAHIRO Tomoyuki

p5pRT commented 18 years ago

From @rgs

SADAHIRO Tomoyuki wrote:

Here is a patch; the test suite in t/op/sprintf2.t used \xe4 but in EBCDIC this byte is U then must have no change on utf8_upgraded. \xb4 is upgraded into two octets even ifdef EBCDIC\, hence it's appropriate for the test.

Thanks\, applied as change #28328 (with further test suite tweaks).

p5pRT commented 18 years ago

@rgs - Status changed from 'open' to 'resolved'

p5pRT commented 18 years ago

From BQW10602@nifty.com

On Sun\, 28 May 2006 20:42:25 +0900\, SADAHIRO Tomoyuki \bqw10602@nifty\.com wrote

Currently the width and the precision for %s are in characters; for example: sprintf("%03s"\, "\x{100}") returns "00\x{100}". sprintf("%.3s"\, "\x{abcd}12345") returns "\x{abcd}12".

But the width and the precision for %c and in bytes; for example: sprintf("%03c"\, 0x100) returns "0\x{100}". sprintf("%.2c"\, 0xabcd) returns "\352\257" (malformed UTF8).

Then %n sets the number of characters output in bytes; for example: after sprintf("%s%n"\, "\x{beef}"\, $a)\, $a is set to 3.

In my opinion these numbers for Unicode should be regarded as those in characters.

First\, as well as printf() can take IO layers\, sprintf() can set the flag SVf_UTF8 of the return value. Hence the format string for them should cope with the character semantics.

Second\, a character in the right-hand part of latin1\, say U+00DF\, is represented by a two-byte sequence in UTF-8. Currently printf("%s%n"\, pack('U'\, 0xDF)\, $a) sets $a to 2 though the output is actually a single byte\, since doio.c#Perl_do_print converts it from utf8 to bytes (from UTF-8 to Latin1 ifndef EBCDIC). To me\, this result is inconsistent.

Third\, the output encoding can be freely changed through IO layers. The number of bytes mapped to a Unicode character varies depending on the encoding\, while the number of characters mapped to a Unicode character is almost always 1 in most encodings. If the numbers are in characters\, the results coincide better.

Regards\, SADAHIRO Tomoyuki

Perl / perl5