Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

UTF-8 tr still hurts #3040

Closed p5pRT closed 20 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#5023 (status was 'resolved')

Searchable as RT5023$

p5pRT commented 23 years ago

From @jhi

Created by jhi@kosh.hut.fi

Just perlbugging the proposed news tr tests​: though Inaba's patch (#8267) makes the situation much better some tr bugs still remain.

==== //depot/perl/t/op/tr.t#10 - /u/vieraat/vieraat/jhi/pp4/perl/t/op/tr.t ==== Index​: perl/t/op/tr.t

Inline Patch ```diff --- perl/t/op/tr.t.~1~ Sat Dec 30 20:23:18 2000 +++ perl/t/op/tr.t Sat Dec 30 20:23:18 2000 @@ -5,7 +5,7 @@ @INC = '../lib'; } -print "1..29\n"; +print "1..46\n"; $_ = "abcdefghijklmnopqrstuvwxyz"; @@ -181,3 +181,95 @@ print (($@ =~ m|^Can't modify constant item in transliteration \(tr///\)|) ? '' : 'not ', "ok 29\n"); +# v300 (0x12c) is UTF-8-encoded as 196 172 (0xc4 0xac) +# v400 (0x190) is UTF-8-encoded as 198 144 (0xc6 0x90) + +# Transliterate a byte to a byte, all four ways. + +($a = v300.196.172.300.196.172) =~ tr/\xc4/\xc5/; +print "not " unless $a eq v300.197.172.300.197.172; +print "ok 30\n"; + +($a = v300.196.172.300.196.172) =~ tr/\xc4/\x{c5}/; +print "not " unless $a eq v300.197.172.300.197.172; +print "ok 31\n"; + +($a = v300.196.172.300.196.172) =~ tr/\x{c4}/\xc5/; +print "not " unless $a eq v300.197.172.300.197.172; +print "ok 32\n"; + +($a = v300.196.172.300.196.172) =~ tr/\x{c4}/\x{c5}/; +print "not " unless $a eq v300.197.172.300.197.172; +print "ok 33\n"; + +# Transliterate a byte to a wide character. + +($a = v300.196.172.300.196.172) =~ tr/\xc4/\x{12d}/; +print "not " unless $a eq v300.301.172.300.301.172; +print "ok 34\n"; + +# Transliterate a wide character to a byte. + +($a = v300.196.172.300.196.172) =~ tr/\x{12c}/\xc3/; +print "not " unless $a eq v195.196.172.195.196.172; +print "ok 35\n"; + +# Transliterate a wide character to a wide character. + +($a = v300.196.172.300.196.172) =~ tr/\x{12c}/\x{12d}/; +print "not " unless $a eq v301.196.172.301.196.172; +print "ok 36\n"; + +# Transliterate both ways. + +($a = v300.196.172.300.196.172) =~ tr/\xc4\x{12c}/\x{12d}\xc3/; +print "not " unless $a eq v195.301.172.195.301.172; +print "ok 37\n"; + +# Transliterate all (four) ways. + +($a = v300.196.172.300.196.172.400.198.144) =~ + tr/\xac\xc4\x{12c}\x{190}/\xad\x{12d}\xc5\x{191}/; +print "not " unless $a eq v197.301.173.197.301.173.401.198.144; +print "ok 38\n"; + +# Transliterate and count. + +print "not " + unless (($a = v300.196.172.300.196.172) =~ tr/\xc4/\xc5/) == 2; +print "ok 39\n"; + +print "not " + unless (($a = v300.196.172.300.196.172) =~ tr/\x{12c}/\x{12d}/) == 2; +print "ok 40\n"; + +# Transliterate with complement. + +($a = v300.196.172.300.196.172) =~ tr/\xc4/\x{12d}/c; +print "not " unless $a eq v301.196.301.301.196.301; +print "ok 41\n"; + +($a = v300.196.172.300.196.172) =~ tr/\x{12c}/\xc5/c; +print "not " unless $a eq v300.197.197.300.197.197; +print "ok 42\n"; + +# Transliterate with deletion. + +($a = v300.196.172.300.196.172) =~ tr/\xc4//d; +print "not " unless $a eq v300.172.300.172; +print "ok 43\n"; + +($a = v300.196.172.300.196.172) =~ tr/\x{12c}//d; +print "not " unless $a eq v196.172.196.172; +print "ok 44\n"; + +# Transliterate with squeeze. + +($a = v196.196.172.300.300.196.172) =~ tr/\xc4/\xc5/s; +print "not " unless $a eq v197.172.300.300.197.172; +print "ok 45\n"; + +($a = v196.172.300.300.196.172.172) =~ tr/\x{12c}/\x{12d}/s; +print "not " unless $a eq v196.172.301.196.172.172; +print "ok 46\n"; + End of Patch. ```
Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.7.0: Configured by jhi at Sat Dec 30 22:33:44 EET 2000. Summary of my perl5 (revision 5.0 version 7 subversion 0) configuration: Platform: osname=dec_osf, osvers=4.0f, archname=alpha-dec_osf uname='osf1 kosh.hut.fi v4.0 1229 alpha ' config_args='-des -Dusedevel -Doptimize=-g -Dccflags=-DDEBUGGING' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef Compiler: cc='cc', ccflags ='-DDEBUGGING -std -DDEBUGGING -DLANGUAGE_C', optimize='-g', cppflags='-DDEBUGGING -std -DDEBUGGING -DLANGUAGE_C' ccversion='V5.9-010', gccversion='', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, usemymalloc=y, prototype=define Linker and Libraries: ld='ld', ldflags ='' libpth=/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /var/shlib libs=-lgdbm -ldbm -ldb -lm -liconv -lutil perllibs=-lm -liconv -lutil libc=/usr/shlib/libc.so, so=so, useshrplib=true, libperl=libperl.so Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -Wl,-rpath,/usr/local/lib/perl5/5.7.0/alpha-dec_osf/CORE' cccdlflags=' ', lddlflags='-shared -expect_unresolved "*" -g -msym -std' Locally applied patches: DEVEL8268 @INC for perl v5.7.0: lib /u/vieraat/vieraat/jhi/Perl/lib /usr/local/lib/perl5/5.7.0/alpha-dec_osf /usr/local/lib/perl5/5.7.0 /usr/local/lib/perl5/site_perl/5.7.0/alpha-dec_osf /usr/local/lib/perl5/site_perl/5.7.0 /usr/local/lib/perl5/site_perl . Environment for perl v5.7.0: HOME=/u/vieraat/vieraat/jhi LANG=C LANGUAGE (unset) LC_ALL=fi_FI.ISO8859-1 LC_CTYPE=fi_FI.ISO8859-1 LD_LIBRARY_PATH=/u/vieraat/vieraat/jhi/pp4/perl LOGDIR (unset) PATH=/u/vieraat/vieraat/jhi/Perl/bin:/u/vieraat/vieraat/jhi/.s:/u/vieraat/vieraat/jhi/.b/OSF1:/c/bin:/p/bin:/p/adm/bin:/usr/bin:/usr/sbin:/sbin:/bin:/usr/ccs/bin:/usr/lib:/etc:/lib:/p/X6/bin:/p/X5/bin:/usr/bin/X11:/usr/lbin:/usr/sbin/acct:/usr/tcb/bin:/tcb/bin:/usr/field:/u/vieraat/vieraat/jhi PERLLIB=/u/vieraat/vieraat/jhi/Perl/lib PERL_BADLANG (unset) SHELL=/bin/zsh ```