Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

perluniintro inaccurate answer to testing encoding validity #8942

Closed p5pRT closed 17 years ago

p5pRT commented 17 years ago

Migrated from rt.perl.org#43287 (status was 'resolved')

Searchable as RT43287$

p5pRT commented 17 years ago

From dannyr@wirespring.com

perldoc perluniintro currently says​:   · How Do I Detect Data That’s Not Valid In a Particular Encoding?

  Use the "Encode" package to try converting it. For example\,

  use Encode ’decode_utf8’;   if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) {   # valid   } else {   # invalid   }

Which does not match my tests or the Encode documentation which states that malformed characters are replaced with a substitution character; it does not return true or false.

% perl -e '$n="\x{c5}";use Encode;print decode_utf8($n)?"valid"​:"invalid";' valid

So you need to use a CHECK function other than the default.

% perl -e '$n="\x{c5}";use Encode;eval {decode_utf8($n\, Encode​::FB_CROAK)}; print $@​?"invalid"​:"valid";' invalid % perl -e '$n="\x{c3}\x{85}";use Encode;eval {decode_utf8($n\, Encode​::FB_CROAK)}; print $@​?"invalid"​:"valid";' valid

I don't think it is relevant for a documentation bug. :) But just in case\, here is my perldebug -d output​:

Site configuration information for perl v5.8.8​:

Configured by Debian Project at Wed Dec 6 23​:17​:41 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration​:   Platform​:   osname=linux\, osvers=2.6.18.3\, archname=i486-linux-gnu-thread-multi   uname='linux saens 2.6.18.3 #1 smp sat nov 25 13​:39​:52 est 2006 i686 gnulinux '   config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=define use5005threads=undef useithreads=define usemultiplicity=define   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-O2'\,   cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'   ccversion=''\, gccversion='4.1.2 20061115 (prerelease) (Debian 4.1.1-20)'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib   libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt   perllibs=-ldl -lm -lpthread -lc -lcrypt   libc=/lib/libc-2.3.6.so\, so=so\, useshrplib=true\, libperl=libperl.so.5.8.8   gnulibc_version='2.3.6'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:  


@​INC for perl v5.8.8​:   /etc/perl   /usr/local/lib/perl/5.8.8   /usr/local/share/perl/5.8.8   /usr/lib/perl5   /usr/share/perl5   /usr/lib/perl/5.8   /usr/share/perl/5.8   /usr/local/lib/site_perl   .


Environment for perl v5.8.8​:   HOME=/home/dkr   LANG=en_US.UTF-8   LANGUAGE (unset)   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/usr/sbin​:/sbin​:/usr/bin​:/bin​:/home/dkr/bin​:/usr/local/bin​:/usr/local/sbin​:/usr/X11R6/bin​:/usr/games   PERL_BADLANG (unset)   SHELL=/bin/tcsh

--   _.\,-*~`^'~*-\,._ Danny Rathjens _.\,-*~`^'~*-\,._ FireCast​: Rock solid kiosk software​: http​://www.wirespring.com/

p5pRT commented 17 years ago

From @rgs

Thanks\, I've reworked the docs accordingly to your suggestion as change #31462 to bleadperl.

p5pRT commented 17 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 17 years ago

@rgs - Status changed from 'open' to 'resolved'