Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 553 forks source link

'use locale' effectless? #1217

Closed p5pRT closed 18 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#2200 (status was 'resolved')

Searchable as RT2200$

p5pRT commented 24 years ago

From gomar@md.media-web.de

Created by gomar@mindless.com

The following snippet of code (ISO8859-1 charset)​:

  use locale;   print int('��' =~ /��/i);

together with appropriate environment settings for LC_ALL (=de_DE) and LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl 5.5.650. I've tried other locale settings too\, such as de_DE.ISO8859-1 etc.\, yet without any improvements. Since I'm not aware of any error on my part or a misconfiguration of my system\, I suppose this to be a bug. Again\, I haven't been experiencing any problems with respect to locale settings before\, neither with Perl 5.00503 and lower\, nor with other programs. (Note​: Both Perl 5.00503 and 5.5.650 are linked against the same versions of libc and libnsl\, in case that might matter.)

Perl Info ``` Site configuration information for perl v5.5.650: Configured by root at Tue Feb 15 19:19:50 CET 2000. Summary of my perl5 (revision 5.0 version 5 subversion 650) configuration: Platform: osname=linux, osvers=2.3.40, archname=i586-linux uname='linux c241-1 2.3.40 #1 fre jan 21 09:14:00 cet 2000 i586 unknown ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usesocks=undef useperlio=undef d_sfio=undef use64bits=define uselargefiles=define usemultiplicity=undef Compiler: cc='cc', optimize='-O2', gccversion=2.95.2 19991024 (release) cppflags='-Dbool=char -DHAS_BOOL -fno-strict-aliasing -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -fno-strict-aliasing -I/usr/local/include -DUSE_LONG_LONG' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt libc=/lib/libc-2.1.2.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.5.650: /home/gomar/perl /usr/lib/perl5/5.5.650/i586-linux /usr/lib/perl5/5.5.650 /usr/lib/perl5/site_perl/5.5.650/i586-linux /usr/lib/perl5/site_perl/5.5.650 /usr/lib/perl5/site_perl/5.005/i586-linux /usr/lib/perl5/site_perl/5.005 /usr/lib/perl5/site_perl . Environment for perl v5.5.650: HOME=/home/gomar LANG=de LANGUAGE (unset) LC_ALL=de_DE LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib:/usr/X11R6/lib:/usr/openwin/lib:/usr/local/kde/lib:/usr/lib/qt/lib:/usr/local/lib/gtk/themes/engines LOGDIR (unset) PATH=/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local/kde/bin:/usr/lib/java/bin/Linux/green_threads:/usr/X11R6/pbmplus:/usr/games:/usr/local/games:/usr/openwin/bin:/usr/games:/home/gomar/bin:.:/usr/bin/TeX PERL5LIB=/home/gomar/perl PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 24 years ago

From @jhi

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale; print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl 5.5.650. I've tried other locale settings too\, such as de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660\, Digital UNIX\, with various European locales\, and the fact that in 5.005_03 the bug didn't exist.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 18 years ago

From @smpeters

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale; print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl 5.5.650. I've tried other locale settings too\, such as de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660\, Digital UNIX\, with various European locales\, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems\, and I have not been able to reproduce the problem.

steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1

I'm assuming that this has been fixed (and should be in Changes) somewhere between 5.5.650 and 5.8.

p5pRT commented 18 years ago

@smpeters - Status changed from 'open' to 'resolved'

p5pRT commented 18 years ago

From @demerphq

On 2/6/06\, Steve Peters via RT \perlbug\-followup@​perl\.org wrote​:

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale; print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl 5.5.650. I've tried other locale settings too\, such as de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660\, Digital UNIX\, with various European locales\, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems\, and I have not been able to reproduce the problem.

steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1

I'm assuming that this has been fixed (and should be in Changes) somewhere between 5.5.650 and 5.8.

I wonder.... Your sample code appears different from the OP's.

You have

  int("?"=~/\?/i)

where the OP had

  print int('Ü' =~ /ü/i);

The latter could be said verbosly as

"the integer value of the return of case insensitively matching capital U umlaut with lowercase u umlaut"

As far as I recall perl doesn't do local based matching on non unicode strings (but i may recall incorrectly). I think that if the OP converts the expression to a unicode string then the match should be ok.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 18 years ago

From @smpeters

On Mon\, Feb 06\, 2006 at 07​:10​:36PM +0100\, demerphq wrote​:

On 2/6/06\, Steve Peters via RT \perlbug\-followup@​perl\.org wrote​:

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale; print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl 5.5.650. I've tried other locale settings too\, such as de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660\, Digital UNIX\, with various European locales\, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems\, and I have not been able to reproduce the problem.

steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1

I'm assuming that this has been fixed (and should be in Changes) somewhere between 5.5.650 and 5.8.

I wonder.... Your sample code appears different from the OP's.

You have

int\("?"=~/\\?/i\)

where the OP had

print int\('Ü' =~ /ü/i\);

The latter could be said verbosly as

"the integer value of the return of case insensitively matching capital U umlaut with lowercase u umlaut"

As far as I recall perl doesn't do local based matching on non unicode strings (but i may recall incorrectly). I think that if the OP converts the expression to a unicode string then the match should be ok.

Hmmm...the prints are there\, just on the previous lines. I think you missed something with my annoying line wrapping. Sorry abot that.

Steve Peters steve@​fisharerojo.org

p5pRT commented 18 years ago

From @demerphq

On 2/6/06\, Steve Peters \steve@​fisharerojo\.org wrote​:

On Mon\, Feb 06\, 2006 at 07​:10​:36PM +0100\, demerphq wrote​:

On 2/6/06\, Steve Peters via RT \perlbug\-followup@​perl\.org wrote​:

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale; print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl 5.5.650. I've tried other locale settings too\, such as de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660\, Digital UNIX\, with various European locales\, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems\, and I have not been able to reproduce the problem.

steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1 steve@​kirk​:\~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print int("?" =~ /\?/i)' 1

I'm assuming that this has been fixed (and should be in Changes) somewhere between 5.5.650 and 5.8.

I wonder.... Your sample code appears different from the OP's.

You have

int\("?"=~/\\?/i\)

where the OP had

print int\('Ü' =~ /ü/i\);

The latter could be said verbosly as

"the integer value of the return of case insensitively matching capital U umlaut with lowercase u umlaut"

As far as I recall perl doesn't do local based matching on non unicode strings (but i may recall incorrectly). I think that if the OP converts the expression to a unicode string then the match should be ok.

Hmmm...the prints are there\, just on the previous lines. I think you missed something with my annoying line wrapping. Sorry abot that.

No\, the issue isnt the missing prints. Its that you are matching ? against ? whereas the OP is matching CAPITAL-U-WITH-UMLAUT against LOWER-U-WITH-UMLAUT (not sure if those are the 'real' names for these letters.) U with umlaut looks like a U with two dots above it.

It could be my reader of course\, but the OP's code and your code do not render the same\, so i suspect its your reader making them look the same and not mine.

IOW\, do the two lines following look the same or different?

int('Ü' =~ /\ü/i) int("?"=~/\?/i)

If they look different then ignore me\, if not then I suspect your email client is lying to you.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 18 years ago

From @demerphq

On 2/6/06\, demerphq \demerphq@​gmail\.com wrote​:

IOW\, do the two lines following look the same or different?

int('Ü' =~ /\ü/i) int("?"=~/\?/i)

If they look different then ignore me\, if not then I suspect your email client is lying to you.

Ignore the quotes on the left hand side when you are seeing if they are different.

Or just look at this instead​:

int('Ü' =~ /\ü/i) int('?'=~/\?/i)

(Sorry about the quotes confusing things :-)

yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"