Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 554 forks source link

Encode::encode and destruction of the argument #7868

Closed p5pRT closed 12 years ago

p5pRT commented 19 years ago

Migrated from rt.perl.org#34905 (status was 'rejected')

Searchable as RT34905$

p5pRT commented 19 years ago

From perl-5.8.0@ton.iguana.be

Created by perl-5.8.0@ton.iguana.be

When using Encode I was unpleasantly surprised by this​:

perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_CROAK); print "a=\<$a>\n"' a=\<>

So encode() destroys its argument. As far as I see this possibility is nowhere explicitely documented in the Encode docs\, certainly not in the section about the encode() function.

Later on in the section "Handling Malformed Data" there is this table though​:

  FB_DEFAULT FB_CROAK FB_QUIET FB_WARN FB_PERLQQ   DIE_ON_ERR 0x0001 X   WARN_ON_ERR 0x0002 X   RETURN_ON_ERR 0x0004 X X   LEAVE_SRC 0x0008   PERLQQ 0x0100 X   HTMLCREF 0x0200   XMLCREF 0x0400

While the meaning of LEAVE_SRC is nowhere documented\, one could guess it controls this behaviour\, and indeed​:

perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_CROAK | Encode​::LEAVE_SRC); print "a=\<$a>\n"' a=\

However\, from that table I'd also conclude that e.g. FB_DEFAULT should destroy its argument\, however​:

perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_DEFAULT); print "a=\<$a>\n"' a=\

I think this pretty important behaviour needs documentation and the flags table should be made right

Perl Info ``` Flags: category=core severity=medium This perlbug was built using Perl v5.8.6 - Fri Dec 24 19:25:13 CET 2004 It is being executed now by Perl v5.8.4 - Thu Jun 3 13:28:19 CEST 2004. Site configuration information for perl v5.8.4: Configured by ton at Thu Jun 3 13:28:19 CEST 2004. Summary of my perl5 (revision 5 version 8 subversion 4) configuration: Platform: osname=linux, osvers=2.6.5, archname=i686-linux-64int-ld uname='linux quasar 2.6.5 #8 mon apr 5 05:41:20 cest 2004 i686 gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=define usemymalloc=y, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -fomit-frame-pointer', cppflags='-fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='3.4.0 20031231 (experimental)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.4: /usr/lib/perl5/5.8.4/i686-linux-64int-ld /usr/lib/perl5/5.8.4 /usr/lib/perl5/site_perl/5.8.4/i686-linux-64int-ld /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl . Environment for perl v5.8.4: HOME=/home/ton LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/home/oracle/product/9.2/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:. PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 12 years ago

From @jkeenan

On Sun Apr 10 06​:59​:44 2005\, perl-5.8.0@​ton.iguana.be wrote​:

When using Encode I was unpleasantly surprised by this​:

perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_CROAK); print "a=\<$a>\n"' a=\<>

So encode() destroys its argument.

More precisely\, it destroys its argument when Encode​::FB_CROAK is provided as the third argument to encode().

### $ perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a); print "a=\<$a>\n"' a=\

$ perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_DEFAULT); print "a=\<$a>\n"' a=\

$ perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_CROAK); print "a=\<$a>\n"' a=\<> ###

p5pRT commented 12 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 12 years ago

From @Hugmeir

On Sat\, Nov 19\, 2011 at 12​:51 PM\, James E Keenan via RT \< perlbug-followup@​perl.org> wrote​:

On Sun Apr 10 06​:59​:44 2005\, perl-5.8.0@​ton.iguana.be wrote​:

When using Encode I was unpleasantly surprised by this​:

perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_CROAK); print "a=\<$a>\n"' a=\<>

So encode() destroys its argument.

More precisely\, it destroys its argument when Encode​::FB_CROAK is provided as the third argument to encode().

### $ perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a); print "a=\<$a>\n"' a=\

$ perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_DEFAULT); print "a=\<$a>\n"' a=\

$ perl -MEncode -wle '$a="abcd"; encode("utf8"\, $a\, Encode​::FB_CROAK); print "a=\<$a>\n"' a=\<> ###

This is actually in the docs though; See the part about Encode​::LEAVE_SRC​: "If the Encode​::LEAVE_SRC bit is not set\, but CHECK is\, then the second argument to encode() or decode() may be assigned to by the functions. If you're not interested in this\, then bitwise-or the bitmask with it."

So this isn't a bug.

p5pRT commented 12 years ago

@cpansprout - Status changed from 'open' to 'rejected'