Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

PerlIO layer utf-16-le overwrites byte in filehandle backed by scalar #16453

Open p5pRT opened 6 years ago

p5pRT commented 6 years ago

Migrated from rt.perl.org#132949 (status was 'open')

Searchable as RT132949$

p5pRT commented 6 years ago

From @nrdvana

This is a bug report for perl from mike@​nrdvana.net\, generated with the help of perlbug 1.40 running under perl 5.26.1.


I've run into a bug seemingly in every perl version (at least 5.12 through 5.26) where   1) create a file handle on top of a scalar\, for reading   2) set a wide char unicode layer (16-bit\, 32-bit) on the file handle   3) read from the file handle causes the byte at the current position to be set to zero. The scalar should never be modified on a read-only file handle.

Example​: https://gist.github.com/nrdvana/fe01eeda2e325d825ca811267bd349ff

use strict; use warnings; use autodie; sub hexdump { join ' '\, map sprintf("%02X"\, ord $_)\, split //\, shift } my ($in_fh\, $input\, $buf1);

print "utf-16-le\n\n";

$input= "\xFF\xFE\x11\x22\x33\x44\x55\x66"; open($in_fh\, "\<"\, \$input); print "after open input=".hexdump($input)."\n"; binmode($in_fh\, "​:encoding(utf-16-le)"); print "after binmode input=".hexdump($input)."\n"; read($in_fh\, $buf1\, 1); print "after read input=".hexdump($input)   ." buf1=".hexdump($buf1)."\n";

print "\nutf-16-be\n\n";

$input= "\xFE\xFF\x11\x22\x33\x44\x55\x66"; open($in_fh\, "\<"\, \$input); print "after open input=".hexdump($input)."\n"; binmode($in_fh\, "​:encoding(utf-16-be)"); print "after binmode input=".hexdump($input)."\n"; read($in_fh\, $buf1\, 1); print "after read input=".hexdump($input)   ." buf1= ".hexdump($buf1)."\n";

print "\nutf-32-le\n\n";

$input= "\xFF\xFE\x00\x00\x33\x44\x00\x00"; open($in_fh\, "\<"\, \$input); print "after open input=".hexdump($input)."\n"; binmode($in_fh\, "​:encoding(utf-32-le)"); print "after binmode input=".hexdump($input)."\n"; read($in_fh\, $buf1\, 1); print "after read input=".hexdump($input)   ." buf1= ".hexdump($buf1)."\n";

Output --------------------------

utf-16-le

after open input=FF FE 11 22 33 44 55 66 after binmode input=FF FE 11 22 33 44 55 66 after read input=00 FE 11 22 33 44 55 66 buf1= FEFF

utf-16-be

after open input=FE FF 11 22 33 44 55 66 after binmode input=FE FF 11 22 33 44 55 66 after read input=00 FF 11 22 33 44 55 66 buf1= FEFF

utf-32-le

after open input=FF FE 00 00 33 44 00 00 after binmode input=FF FE 00 00 33 44 00 00 after read input=00 FE 00 00 33 44 00 00 buf1= FEFF



Flags​:   category=core   severity=medium


Site configuration information for perl 5.26.1​:

Configured by builduser at Fri Jan 5 02​:49​:35 UTC 2018.

Summary of my perl5 (revision 5 version 26 subversion 1) configuration​:   Platform​:   osname=linux   osvers=4.14.11-1-arch   archname=x86_64-linux-thread-multi   uname='linux felix 4.14.11-1-arch #1 smp preempt wed jan 3 07​:02​:42 utc 2018 x86_64 gnulinux '   config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.26/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.26/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.26/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dcccdlflags='-fPIC' -Dlddlflags=-shared -Wl\,-O1\,--sort-common\,--as-needed\,-z\,relro\,-z\,now -Dldflags=-Wl\,-O1\,--sort-common\,--as-needed\,-z\,relro\,-z\,now'   hint=recommended   useposix=true   d_sigaction=define   useithreads=define   usemultiplicity=define   use64bitint=define   use64bitall=define   uselongdouble=undef   usemymalloc=n   default_inc_excludes_dot=define   bincompat5005=undef   Compiler​:   cc='cc'   ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'   optimize='-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt'   cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'   ccversion=''   gccversion='7.2.1 20171224'   gccosandvers=''   intsize=4   longsize=8   ptrsize=8   doublesize=8   byteorder=12345678   doublekind=3   d_longlong=define   longlongsize=8   d_longdbl=define   longdblsize=16   longdblkind=3   ivtype='long'   ivsize=8   nvtype='double'   nvsize=8   Off_t='off_t'   lseeksize=8   alignbytes=8   prototype=define   Linker and Libraries​:   ld='cc'   ldflags ='-Wl\,-O1\,--sort-common\,--as-needed\,-z\,relro\,-z\,now -fstack-protector-strong -L/usr/local/lib'   libpth=/usr/local/lib /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.1/include-fixed /usr/lib /lib/../lib /usr/lib/../lib /lib /lib64 /usr/lib64   libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat   perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc   libc=libc-2.26.so   so=so   useshrplib=true   libperl=libperl.so   gnulibc_version='2.26'   Dynamic Linking​:   dlsrc=dl_dlopen.xs   dlext=so   d_dlsymun=undef   ccdlflags='-Wl\,-E -Wl\,-rpath\,/usr/lib/perl5/5.26/core_perl/CORE'   cccdlflags='-fPIC'   lddlflags='-shared -Wl\,-O1\,--sort-common\,--as-needed\,-z\,relro\,-z\,now -L/usr/local/lib -fstack-protector-strong'


@​INC for perl 5.26.1​:   /usr/lib/perl5/5.26/site_perl   /usr/share/perl5/site_perl   /usr/lib/perl5/5.26/vendor_perl   /usr/share/perl5/vendor_perl   /usr/lib/perl5/5.26/core_perl   /usr/share/perl5/core_perl

(pardon for trimming the environment\, but it had things I didn't want to publish)

p5pRT commented 6 years ago

From @tonycoz

On Wed\, 07 Mar 2018 23​:16​:28 -0800\, mike@​nrdvana.net wrote​:

I've run into a bug seemingly in every perl version (at least 5.12 through 5.26) where 1) create a file handle on top of a scalar\, for reading 2) set a wide char unicode layer (16-bit\, 32-bit) on the file handle 3) read from the file handle causes the byte at the current position to be set to zero. The scalar should never be modified on a read-only file handle.

Example​: https://gist.github.com/nrdvana/fe01eeda2e325d825ca811267bd349ff

use strict; use warnings; use autodie; sub hexdump { join ' '\, map sprintf("%02X"\, ord $_)\, split //\, shift } my ($in_fh\, $input\, $buf1);

print "utf-16-le\n\n";

$input= "\xFF\xFE\x11\x22\x33\x44\x55\x66"; open($in_fh\, "\<"\, \$input); print "after open input=".hexdump($input)."\n"; binmode($in_fh\, "​:encoding(utf-16-le)"); print "after binmode input=".hexdump($input)."\n"; read($in_fh\, $buf1\, 1); print "after read input=".hexdump($input) ." buf1=".hexdump($buf1)."\n";

print "\nutf-16-be\n\n";

$input= "\xFE\xFF\x11\x22\x33\x44\x55\x66"; open($in_fh\, "\<"\, \$input); print "after open input=".hexdump($input)."\n"; binmode($in_fh\, "​:encoding(utf-16-be)"); print "after binmode input=".hexdump($input)."\n"; read($in_fh\, $buf1\, 1); print "after read input=".hexdump($input) ." buf1= ".hexdump($buf1)."\n";

print "\nutf-32-le\n\n";

$input= "\xFF\xFE\x00\x00\x33\x44\x00\x00"; open($in_fh\, "\<"\, \$input); print "after open input=".hexdump($input)."\n"; binmode($in_fh\, "​:encoding(utf-32-le)"); print "after binmode input=".hexdump($input)."\n"; read($in_fh\, $buf1\, 1); print "after read input=".hexdump($input) ." buf1= ".hexdump($buf1)."\n";

Output --------------------------

utf-16-le

after open input=FF FE 11 22 33 44 55 66 after binmode input=FF FE 11 22 33 44 55 66 after read input=00 FE 11 22 33 44 55 66 buf1= FEFF

utf-16-be

after open input=FE FF 11 22 33 44 55 66 after binmode input=FE FF 11 22 33 44 55 66 after read input=00 FF 11 22 33 44 55 66 buf1= FEFF

utf-32-le

after open input=FF FE 00 00 33 44 00 00 after binmode input=FF FE 00 00 33 44 00 00 after read input=00 FE 00 00 33 44 00 00 buf1= FEFF

This is a duplicate of #132833\, which was fixed in fed9fe5b48ccdffef9065a03c12c237cc7418de6.

I don't see this commit in the 5.26 votes file.

Tony

p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'