Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 551 forks source link

Enable perl core tests to pass when the vendor has no locale support #12764

Closed p5pRT closed 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#116693 (status was 'resolved')

Searchable as RT116693$

p5pRT commented 11 years ago

From castaway@desert-island.me.uk

Created by castaway@desert-island.me.uk

Please find attached a patch which allows Perl's core tests to pass/skip
on systems which do not implement the "locale" system\, as expected by "use
locale" and "use POSIX '​:local_h'". The patch also makes "use locale" die
if $Config{d_setlocale} is not true.

This is the case with Android\, which I am targetting in my cross-compiler
grant. (It uses ICU instead\, adding support for that is left as a later
exercise!)

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.17.8: Configured by castaway at Fri Feb 8 12:40:58 GMT 2013. Summary of my perl5 (revision 5 version 17 subversion 8) configuration: Derived from: 1999188643dea06da9c3a6a24085bd91bb0aa7f9 Platform: osname=linux, osvers=3.0.4insel, archname=i686-linux uname='linux insel 3.0.4insel #1 smp sat oct 29 23:51:24 bst 2011 i686 amd athlon(tm) ii x2 4400e processor authenticamd gnulinux ' config_args='-des -Dusedevel -Dd_setlocale=undef -Dd_locconv=undef -Dd_setlocale_r=undef -Di_locale=undef' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.5.4', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.11.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.11.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: @INC for perl 5.17.8: lib /home/castaway/perl/lib /home/theorb/perl/lib /usr/local/lib/perl5/site_perl/5.17.8/i686-linux /usr/local/lib/perl5/site_perl/5.17.8 /usr/local/lib/perl5/5.17.8/i686-linux /usr/local/lib/perl5/5.17.8 /usr/local/lib/perl5/site_perl . Environment for perl 5.17.8: HOME=/home/castaway LANG=en_GB.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH=/home/missys/sqllib/lib LOGDIR (unset) PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.5.4:/usr/i686-pc-linux-gnu/armv6j-hardfloat-linux-gnueabi/gcc-bin/4.5.4:/usr/i686-pc-linux-gnu/avr/gcc-bin/4.3.4:/opt/sun-jdk-1.4.2.18/bin:/opt/sun-jdk-1.4.2.18/jre/bin:/opt/sun-jdk-1.4.2.18/jre/javaws:/usr/qt/3/bin:/usr/kde/3.4/bin:/usr/games/bin:/home/missys/sqllib/bin:/home/missys/sqllib/adm:/home/missys/sqllib/misc PERL5LIB=/home/castaway/perl/lib:/home/theorb/perl/lib PERL_AUTOINSTALL_PREFER_CPAN=1 PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 11 years ago

From castaway@desert-island.me.uk

0001-Enable-perl-core-tests-to-pass-when-locale-support-i.patch ```diff From 1999188643dea06da9c3a6a24085bd91bb0aa7f9 Mon Sep 17 00:00:00 2001 From: Jess Robinson Date: Fri, 8 Feb 2013 12:30:05 +0000 Subject: [PATCH] Enable perl core tests to pass when locale support is not available. use locale - this will now die if $Config{d_setlocale} is not true. All tests that use locale will skip if $Config{d_setlocale} is not true. This enables us to pass tests on Android which uses ICU instead of locales. --- ext/Pod-Html/lib/Pod/Html.pm | 9 +- ext/XS-APItest/t/handy.t | 36 +++-- ext/re/t/re_funcs_u.t | 2 + ext/re/t/reflags.t | 6 +- lib/locale.pm | 17 ++- lib/version/t/07locale.t | 2 + t/op/quotemeta.t | 14 ++- t/op/taint.t | 410 +++++++++++++++++++++++++++--------------- t/re/charset.t | 5 +- t/re/fold_grind.t | 25 ++-- t/re/pat.t | 47 ++++- t/run/locale.t | 19 ++- t/uni/fold.t | 10 +- t/uni/overload.t | 13 +- 14 files changed, 421 insertions(+), 194 deletions(-) diff --git a/ext/Pod-Html/lib/Pod/Html.pm b/ext/Pod-Html/lib/Pod/Html.pm index 5b4c51c..72b37c2 100644 --- a/ext/Pod-Html/lib/Pod/Html.pm +++ b/ext/Pod-Html/lib/Pod/Html.pm @@ -3,7 +3,7 @@ use strict; require Exporter; use vars qw($VERSION @ISA @EXPORT @EXPORT_OK); -$VERSION = 1.17; +$VERSION = 1.18; @ISA = qw(Exporter); @EXPORT = qw(pod2html htmlify); @EXPORT_OK = qw(anchorify); @@ -16,8 +16,11 @@ use File::Spec; use File::Spec::Unix; use Getopt::Long; use Pod::Simple::Search; - -use locale; # make \w work right in non-ASCII lands +BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; # make \w work right in non-ASCII lands + } +} =head1 NAME diff --git a/ext/XS-APItest/t/handy.t b/ext/XS-APItest/t/handy.t index 5ecbaa7..eb620ec 100644 --- a/ext/XS-APItest/t/handy.t +++ b/ext/XS-APItest/t/handy.t @@ -2,6 +2,7 @@ use strict; use Test::More; +use Config; use XS::APItest; @@ -11,17 +12,24 @@ sub truth($) { # Converts values so is() works return (shift) ? 1 : 0; } -require POSIX; -my $locale = POSIX::setlocale( &POSIX::LC_ALL, "C"); -if (defined $locale && $locale eq 'C') { - use locale; - - # Some locale implementations don't have the 128-255 characters all - # mean nothing. Skip the locale tests in that situation - for my $i (128 .. 255) { - if (chr($i) =~ /[[:print:]]/) { - undef $locale; - last; +my $locale; +if($Config{d_setlocale}) { + require POSIX; + $locale = POSIX::setlocale( &POSIX::LC_ALL, "C"); + if (defined $locale && $locale eq 'C') { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; # make \w work right in non-ASCII lands + } + } + + # Some locale implementations don't have the 128-255 characters all + # mean nothing. Skip the locale tests in that situation + for my $i (128 .. 255) { + if (chr($i) =~ /[[:print:]]/) { + undef $locale; + last; + } } } } @@ -138,7 +146,7 @@ foreach my $name (sort keys %properties) { } if (defined $locale) { - use locale; + require locale; import locale; $ret = truth eval "test_is${function}_LC($i)"; if ($@) { @@ -160,7 +168,7 @@ foreach my $name (sort keys %properties) { } if (defined $locale && $name ne 'vertws') { - use locale; + require locale; import locale; $ret = truth eval "test_is${function}_LC_uvchr('$i')"; if ($@) { @@ -184,7 +192,7 @@ foreach my $name (sort keys %properties) { } if ($name ne 'vertws' && defined $locale) { - use locale; + require locale; import locale; $ret = truth eval "test_is${function}_LC_utf8('$char')"; if ($@) { diff --git a/ext/re/t/re_funcs_u.t b/ext/re/t/re_funcs_u.t index 3eec289..706437e 100644 --- a/ext/re/t/re_funcs_u.t +++ b/ext/re/t/re_funcs_u.t @@ -92,6 +92,7 @@ if ('1234'=~/(?:(?\d)|(?!))(?\d)(?\d)(?\d)/){ SKIP: { skip_if_miniperl("no dynamic loading on miniperl, no POSIX", 3); + skip 'No locale testing without d_setlocale', 3 if(!$Config::Config{d_setlocale}); require POSIX; my $current_locale = POSIX::setlocale( &POSIX::LC_CTYPE, 'de_DE.ISO-8859-1' ); if ( !$current_locale || $current_locale ne 'de_DE.ISO-8859-1' ) { @@ -108,6 +109,7 @@ if ('1234'=~/(?:(?\d)|(?!))(?\d)(?\d)(?\d)/){ SKIP: { skip_if_miniperl("no dynamic loading on miniperl, no POSIX", 3); + skip 'No locale testing without d_setlocale', 3 if(!$Config::Config{d_setlocale}); require POSIX; my $current_locale = POSIX::setlocale( &POSIX::LC_CTYPE, 'C' ); if ( !$current_locale || $current_locale ne 'C' ) { diff --git a/ext/re/t/reflags.t b/ext/re/t/reflags.t index b2cbf80..482b2c8 100644 --- a/ext/re/t/reflags.t +++ b/ext/re/t/reflags.t @@ -60,7 +60,11 @@ SKIP: { ) { skip "no locale support", 7 } - use locale; + BEGIN { + if($Config::Config{d_setlocale}) { + require locale; import locale; + } + } use re '/u'; is qr//, '(?^u:)', 'use re "/u" with active locale'; no re '/u'; diff --git a/lib/locale.pm b/lib/locale.pm index e57a5fd..ed254cc 100644 --- a/lib/locale.pm +++ b/lib/locale.pm @@ -1,6 +1,7 @@ package locale; -our $VERSION = '1.01'; +our $VERSION = '1.02'; +use Config; $Carp::Internal{ (__PACKAGE__) } = 1; @@ -46,6 +47,14 @@ Unicode and locales, including UTF-8 locales. See L for more detailed information on how Perl supports locales. +=head1 NOTE + +If your system does not support locales, then loading this module will +cause the program to die with a message: + + "Your vendor does not support locales, you cannot use the locale + module." + =cut # A separate bit is used for each of the two forms of the pragma, as they are @@ -60,6 +69,12 @@ $locale::not_chars_hint_bits = 0x10; sub import { shift; # should be 'locale'; not checked + + if(!$Config{d_setlocale}) { + ## No locale support found on this Perl, giving up: + die('Your vendor does not support locales, you cannot use the locale module.'); + } + my $found_not_chars = 0; while (defined (my $arg = shift)) { if ($arg eq ":not_characters") { diff --git a/lib/version/t/07locale.t b/lib/version/t/07locale.t index 01b51a7..a2005f8 100644 --- a/lib/version/t/07locale.t +++ b/lib/version/t/07locale.t @@ -8,6 +8,7 @@ use File::Basename; use File::Temp qw/tempfile/; use POSIX qw/locale_h/; use Test::More tests => 7; +use Config; BEGIN { use_ok('version', 0.9901); @@ -15,6 +16,7 @@ BEGIN { SKIP: { skip 'No locale testing for Perl < 5.6.0', 6 if $] < 5.006; + skip 'No locale testing without d_setlocale', 6 if(!$Config{d_setlocale}); # test locale handling my $warning; local $SIG{__WARN__} = sub { $warning = $_[0] }; diff --git a/t/op/quotemeta.t b/t/op/quotemeta.t index d62563c..1b8613a 100644 --- a/t/op/quotemeta.t +++ b/t/op/quotemeta.t @@ -76,7 +76,9 @@ is(length(quotemeta($char)), 1, "quotemeta '\\N{U+D8}' in UTF-8 length"); is(quotemeta("\x{d8}"), "\\\x{d8}", "quotemeta Latin1 no unicode_strings quoted"); is(length(quotemeta("\x{d8}")), 2, "quotemeta Latin1 no unicode_strings quoted length"); - use locale; + SKIP: { + skip 'No locale testing without d_setlocale', 8 if(!$Config{d_setlocale}); + require locale; import locale; my $char = ":"; is(quotemeta($char), "\\$char", "quotemeta '$char' locale"); @@ -94,6 +96,7 @@ is(length(quotemeta($char)), 1, "quotemeta '\\N{U+D8}' in UTF-8 length"); $char = "\x{D8}"; # Every non-ASCII Latin1 is quoted in locale. is(quotemeta($char), "\\$char", "quotemeta '\\x{D8}' locale"); is(length(quotemeta($char)), 2, "quotemeta '\\x{D8}' locale length"); + } } { use feature 'unicode_strings'; @@ -102,7 +105,13 @@ is(length(quotemeta($char)), 1, "quotemeta '\\N{U+D8}' in UTF-8 length"); is(quotemeta("\x{d8}"), "\x{d8}", "quotemeta Latin1 unicode_strings nonquoted"); is(length(quotemeta("\x{d8}")), 1, "quotemeta Latin1 unicode_strings nonquoted length"); - use locale; + SKIP: { + skip 'No locale testing without d_setlocale', 12 if(!$Config{d_setlocale}); + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } my $char = ":"; utf8::upgrade($char); @@ -128,6 +137,7 @@ is(length(quotemeta($char)), 1, "quotemeta '\\N{U+D8}' in UTF-8 length"); is(length(quotemeta("\x{263a}")), 2, "quotemeta locale Unicode quoted length"); is(quotemeta("\x{100}"), "\x{100}", "quotemeta locale Unicode nonquoted"); is(length(quotemeta("\x{100}")), 1, "quotemeta locale Unicode nonquoted length"); + } } $a = "foo|bar"; diff --git a/t/op/taint.t b/t/op/taint.t index d621de6..268157f 100644 --- a/t/op/taint.t +++ b/t/op/taint.t @@ -296,25 +296,43 @@ my $TEST = 'TEST'; is($res, 1, "$desc: res value"); is($one, 'a', "$desc: \$1 value"); - $desc = "match with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ /(\w+)/; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - isnt_tainted($res, "$desc: res not tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 1, "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); - - $desc = "match /g with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ /(\w)/g; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - isnt_tainted($res, "$desc: res not tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 1, "$desc: res value"); - is($one, 'a', "$desc: \$1 value"); + SKIP: { + skip 'No locale testing without d_setlocale', 10 if(!$Config{d_setlocale}); + + $desc = "match with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ /(\w+)/; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + isnt_tainted($res, "$desc: res not tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 1, "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + + $desc = "match /g with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ /(\w)/g; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + isnt_tainted($res, "$desc: res not tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 1, "$desc: res value"); + is($one, 'a', "$desc: \$1 value"); + } $desc = "match with pattern tainted, list cxt"; @@ -339,27 +357,45 @@ my $TEST = 'TEST'; is($res2,'b', "$desc: res2 value"); is($one, 'd', "$desc: \$1 value"); - $desc = "match with pattern tainted via locale, list cxt"; - - $s = 'abcd'; - { use locale; ($res) = $s =~ /(\w+)/; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 'abcd', "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); - - $desc = "match /g with pattern tainted via locale, list cxt"; + SKIP: { + skip 'No locale testing without d_setlocale', 12 if(!$Config{d_setlocale}); - $s = 'abcd'; - { use locale; ($res, $res2) = $s =~ /(\w)/g; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($res2, "$desc: res2 tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 'a', "$desc: res value"); - is($res2,'b', "$desc: res2 value"); - is($one, 'd', "$desc: \$1 value"); + $desc = "match with pattern tainted via locale, list cxt"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + ($res) = $s =~ /(\w+)/; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 'abcd', "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + + $desc = "match /g with pattern tainted via locale, list cxt"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + ($res, $res2) = $s =~ /(\w)/g; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($res2, "$desc: res2 tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 'a', "$desc: res value"); + is($res2,'b', "$desc: res2 value"); + is($one, 'd', "$desc: \$1 value"); + } $desc = "substitution with string tainted"; @@ -481,38 +517,63 @@ my $TEST = 'TEST'; is($res, 'xyz', "$desc: res value"); is($one, 'abcd', "$desc: \$1 value"); - $desc = "substitution with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ s/(\w+)/xyz/; $one = $1; } - is_tainted($s, "$desc: s tainted"); - isnt_tainted($res, "$desc: res not tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($s, 'xyz', "$desc: s value"); - is($res, 1, "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); - - $desc = "substitution /g with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ s/(\w)/x/g; $one = $1; } - is_tainted($s, "$desc: s tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($s, 'xxxx', "$desc: s value"); - is($res, 4, "$desc: res value"); - is($one, 'd', "$desc: \$1 value"); - - $desc = "substitution /r with pattern tainted via locale"; + SKIP: { + skip 'No locale testing without d_setlocale', 18 if(!$Config{d_setlocale}); - $s = 'abcd'; - { use locale; $res = $s =~ s/(\w+)/xyz/r; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($s, 'abcd', "$desc: s value"); - is($res, 'xyz', "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); + $desc = "substitution with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ s/(\w+)/xyz/; $one = $1; + } + is_tainted($s, "$desc: s tainted"); + isnt_tainted($res, "$desc: res not tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($s, 'xyz', "$desc: s value"); + is($res, 1, "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + + $desc = "substitution /g with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ s/(\w)/x/g; $one = $1; + } + is_tainted($s, "$desc: s tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($s, 'xxxx', "$desc: s value"); + is($res, 4, "$desc: res value"); + is($one, 'd', "$desc: \$1 value"); + + $desc = "substitution /r with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ s/(\w+)/xyz/r; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($s, 'abcd', "$desc: s value"); + is($res, 'xyz', "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + } $desc = "substitution with replacement tainted"; @@ -652,25 +713,43 @@ my $TEST = 'TEST'; is($res, 1, "$desc: res value"); is($one, 'a', "$desc: \$1 value"); - $desc = "use re 'taint': match with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ /(\w+)/; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - isnt_tainted($res, "$desc: res not tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 1, "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); - - $desc = "use re 'taint': match /g with pattern tainted via locale"; + SKIP: { + skip 'No locale testing without d_setlocale', 10 if(!$Config{d_setlocale}); - $s = 'abcd'; - { use locale; $res = $s =~ /(\w)/g; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - isnt_tainted($res, "$desc: res not tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 1, "$desc: res value"); - is($one, 'a', "$desc: \$1 value"); + $desc = "use re 'taint': match with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ /(\w+)/; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + isnt_tainted($res, "$desc: res not tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 1, "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + + $desc = "use re 'taint': match /g with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ /(\w)/g; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + isnt_tainted($res, "$desc: res not tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 1, "$desc: res value"); + is($one, 'a', "$desc: \$1 value"); + } $desc = "use re 'taint': match with pattern tainted, list cxt"; @@ -695,27 +774,45 @@ my $TEST = 'TEST'; is($res2,'b', "$desc: res2 value"); is($one, 'd', "$desc: \$1 value"); - $desc = "use re 'taint': match with pattern tainted via locale, list cxt"; + SKIP: { + skip 'No locale testing without d_setlocale', 12 if(!$Config{d_setlocale}); - $s = 'abcd'; - { use locale; ($res) = $s =~ /(\w+)/; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 'abcd', "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); - - $desc = "use re 'taint': match /g with pattern tainted via locale, list cxt"; - - $s = 'abcd'; - { use locale; ($res, $res2) = $s =~ /(\w)/g; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($res2, "$desc: res2 tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($res, 'a', "$desc: res value"); - is($res2,'b', "$desc: res2 value"); - is($one, 'd', "$desc: \$1 value"); + $desc = "use re 'taint': match with pattern tainted via locale, list cxt"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + ($res) = $s =~ /(\w+)/; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 'abcd', "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + + $desc = "use re 'taint': match /g with pattern tainted via locale, list cxt"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + ($res, $res2) = $s =~ /(\w)/g; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($res2, "$desc: res2 tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($res, 'a', "$desc: res value"); + is($res2,'b', "$desc: res2 value"); + is($one, 'd', "$desc: \$1 value"); + } $desc = "use re 'taint': substitution with string tainted"; @@ -838,38 +935,63 @@ my $TEST = 'TEST'; is($res, 'xyz', "$desc: res value"); is($one, 'abcd', "$desc: \$1 value"); - $desc = "use re 'taint': substitution with pattern tainted via locale"; + SKIP: { + skip 'No locale testing without d_setlocale', 18 if(!$Config{d_setlocale}); - $s = 'abcd'; - { use locale; $res = $s =~ s/(\w+)/xyz/; $one = $1; } - is_tainted($s, "$desc: s tainted"); - isnt_tainted($res, "$desc: res not tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($s, 'xyz', "$desc: s value"); - is($res, 1, "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); - - $desc = "use re 'taint': substitution /g with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ s/(\w)/x/g; $one = $1; } - is_tainted($s, "$desc: s tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($s, 'xxxx', "$desc: s value"); - is($res, 4, "$desc: res value"); - is($one, 'd', "$desc: \$1 value"); - - $desc = "use re 'taint': substitution /r with pattern tainted via locale"; - - $s = 'abcd'; - { use locale; $res = $s =~ s/(\w+)/xyz/r; $one = $1; } - isnt_tainted($s, "$desc: s not tainted"); - is_tainted($res, "$desc: res tainted"); - is_tainted($one, "$desc: \$1 tainted"); - is($s, 'abcd', "$desc: s value"); - is($res, 'xyz', "$desc: res value"); - is($one, 'abcd', "$desc: \$1 value"); + $desc = "use re 'taint': substitution with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ s/(\w+)/xyz/; $one = $1; + } + is_tainted($s, "$desc: s tainted"); + isnt_tainted($res, "$desc: res not tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($s, 'xyz', "$desc: s value"); + is($res, 1, "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + + $desc = "use re 'taint': substitution /g with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ s/(\w)/x/g; $one = $1; + } + is_tainted($s, "$desc: s tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($s, 'xxxx', "$desc: s value"); + is($res, 4, "$desc: res value"); + is($one, 'd', "$desc: \$1 value"); + + $desc = "use re 'taint': substitution /r with pattern tainted via locale"; + + $s = 'abcd'; + { + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $res = $s =~ s/(\w+)/xyz/r; $one = $1; + } + isnt_tainted($s, "$desc: s not tainted"); + is_tainted($res, "$desc: res tainted"); + is_tainted($one, "$desc: \$1 tainted"); + is($s, 'abcd', "$desc: s value"); + is($res, 'xyz', "$desc: res value"); + is($one, 'abcd', "$desc: \$1 value"); + } $desc = "use re 'taint': substitution with replacement tainted"; @@ -2188,9 +2310,15 @@ pass("no death when TARG of ref is tainted"); isnt_tainted $$, "PID not tainted when read in tainted expression"; } -{ +SKIP: { + skip 'No locale testing without d_setlocale', 4 if(!$Config{d_setlocale}); + use feature 'fc'; - use locale; + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } my ($latin1, $utf8) = ("\xDF") x 2; utf8::downgrade($latin1); utf8::upgrade($utf8); diff --git a/t/re/charset.t b/t/re/charset.t index 8d98125..ee3625a 100644 --- a/t/re/charset.t +++ b/t/re/charset.t @@ -3,6 +3,7 @@ BEGIN { chdir 't' if -d 't'; @INC = '../lib'; + require Config; import Config; require './test.pl'; } @@ -35,11 +36,11 @@ $testcases{'[:space:]'} = $testcases{'\s'}; $testcases{'[:word:]'} = $testcases{'\w'}; my @charsets = qw(a d u aa); -if (! is_miniperl()) { +if (! is_miniperl() && $Config{d_setlocale}) { require POSIX; my $current_locale = POSIX::setlocale( &POSIX::LC_ALL, "C") // ""; if ($current_locale eq 'C') { - use locale; + require locale; import locale; # Some locale implementations don't have the 128-255 characters all # mean nothing. Skip the locale tests in that situation diff --git a/t/re/fold_grind.t b/t/re/fold_grind.t index 4e13110..d073498 100644 --- a/t/re/fold_grind.t +++ b/t/re/fold_grind.t @@ -6,6 +6,7 @@ BEGIN { chdir 't' if -d 't'; @INC = '../lib'; require './test.pl'; + require Config; import Config; skip_all_if_miniperl("no dynamic loading on miniperl, no Encode nor POSIX"); } @@ -405,18 +406,20 @@ sub pairs (@) { } my @charsets = qw(d u a aa); -my $current_locale = POSIX::setlocale( &POSIX::LC_ALL, "C") // ""; -if ($current_locale eq 'C') { - use locale; - - # Some locale implementations don't have the range 128-255 characters all - # mean nothing. Skip the locale tests in that situation. - for my $i (128 .. 255) { - my $char = chr($i); - goto bad_locale if uc($char) ne $char || lc($char) ne $char; +if($Config{d_setlocale}) { + my $current_locale = POSIX::setlocale( &POSIX::LC_ALL, "C") // ""; + if ($current_locale eq 'C') { + require locale; import locale; + + # Some locale implementations don't have the range 128-255 characters all + # mean nothing. Skip the locale tests in that situation. + for my $i (128 .. 255) { + my $char = chr($i); + goto bad_locale if uc($char) ne $char || lc($char) ne $char; + } + push @charsets, 'l'; + bad_locale: } - push @charsets, 'l'; -bad_locale: } # Finally ready to do the tests diff --git a/t/re/pat.t b/t/re/pat.t index 768119f..748705d 100644 --- a/t/re/pat.t +++ b/t/re/pat.t @@ -16,6 +16,7 @@ $| = 1; BEGIN { chdir 't' if -d 't'; @INC = ('../lib','.'); + require Config; import Config; require './test.pl'; } @@ -516,24 +517,52 @@ sub run_tests { is(qr/(?u)\b\v$/, '(?^:(?u)\b\v$)', 'Verify (?u) compiles'); my $dual = qr/\b\v$/; - use locale; - my $locale = qr/\b\v$/; - is($locale, '(?^l:\b\v$)', 'Verify has l modifier when compiled under use locale'); - no locale; + my $locale; + + SKIP: { + skip 'No locale testing without d_setlocale', 1 if(!$Config{d_setlocale}); + + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + $locale = qr/\b\v$/; + is($locale, '(?^l:\b\v$)', 'Verify has l modifier when compiled under use locale'); + no locale; + } use feature 'unicode_strings'; my $unicode = qr/\b\v$/; is($unicode, '(?^u:\b\v$)', 'Verify has u modifier when compiled under unicode_strings'); is(qr/abc$dual/, '(?^u:abc(?^:\b\v$))', 'Verify retains d meaning when interpolated under locale'); - is(qr/abc$locale/, '(?^u:abc(?^l:\b\v$))', 'Verify retains l when interpolated under unicode_strings'); + + SKIP: { + skip 'No locale testing without d_setlocale', 1 if(!$Config{d_setlocale}); + + is(qr/abc$locale/, '(?^u:abc(?^l:\b\v$))', 'Verify retains l when interpolated under unicode_strings'); + } no feature 'unicode_strings'; - is(qr/abc$locale/, '(?^:abc(?^l:\b\v$))', 'Verify retains l when interpolated outside locale and unicode strings'); + SKIP: { + skip 'No locale testing without d_setlocale', 1 if(!$Config{d_setlocale}); + + is(qr/abc$locale/, '(?^:abc(?^l:\b\v$))', 'Verify retains l when interpolated outside locale and unicode strings'); + } + is(qr/def$unicode/, '(?^:def(?^u:\b\v$))', 'Verify retains u when interpolated outside locale and unicode strings'); - use locale; - is(qr/abc$dual/, '(?^l:abc(?^:\b\v$))', 'Verify retains d meaning when interpolated under locale'); - is(qr/abc$unicode/, '(?^l:abc(?^u:\b\v$))', 'Verify retains u when interpolated under locale'); + SKIP: { + skip 'No locale testing without d_setlocale', 2 if(!$Config{d_setlocale}); + + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } + is(qr/abc$dual/, '(?^l:abc(?^:\b\v$))', 'Verify retains d meaning when interpolated under locale'); + is(qr/abc$unicode/, '(?^l:abc(?^u:\b\v$))', 'Verify retains u when interpolated under locale'); + } } { diff --git a/t/run/locale.t b/t/run/locale.t index 7bbb0a9..d01e3bc 100644 --- a/t/run/locale.t +++ b/t/run/locale.t @@ -64,7 +64,11 @@ my $original_locale = setlocale(LC_NUMERIC); my ($base, $different, $difference); for ("C", @locales) { # prefer C for the base if available - use locale; + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } setlocale(LC_NUMERIC, $_) or next; my $in = 4.2; # avoid any constant folding bugs if ((my $s = sprintf("%g", $in)) eq "4.2") { @@ -113,14 +117,15 @@ format STDOUT = @.# 4.179 . -{ use locale; write; } +{ require locale; import locale; write; } EOF "too late to look at the locale at write() time"); } { fresh_perl_is(<<'EOF', $difference, {}, -use locale; format STDOUT = +use locale; +format STDOUT = @.# 4.179 . @@ -134,7 +139,11 @@ EOF # do not let "use 5.000" affect the locale! # this test is to prevent regression of [rt.perl.org #105784] fresh_perl_is(<<"EOF", - use locale; + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } use POSIX; my \$i = 0.123; POSIX::setlocale(POSIX::LC_NUMERIC(),"$different"); @@ -163,7 +172,7 @@ EOF local $ENV{LC_NUMERIC} = $_; local $ENV{LC_ALL}; # so it never overrides LC_NUMERIC fresh_perl_is(<<'EOF', "$difference "x4, {}, - use locale; + use locale; use POSIX qw(locale_h); setlocale(LC_NUMERIC, ""); my $in = 4.2; diff --git a/t/uni/fold.t b/t/uni/fold.t index 4c0ef7d..91356bb 100644 --- a/t/uni/fold.t +++ b/t/uni/fold.t @@ -7,6 +7,7 @@ use warnings; BEGIN { chdir 't' if -d 't'; @INC = '../lib'; + require Config; import Config; require './test.pl'; } @@ -426,8 +427,13 @@ foreach my $test_ref (@CF) { utf8::downgrade($latin1); #No-op, but doesn't hurt utf8::upgrade($utf8); is(fc($latin1), fc($utf8), "fc() gives the same results for \\x{$_} in Latin-1 and UTF-8 under unicode_strings"); - { - use locale; + SKIP: { + skip 'No locale testing without d_setlocale', 2 if(!$Config{d_setlocale}); + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } is(fc($latin1), lc($latin1), "use locale; fc(qq{\\x{$_}}), lc(qq{\\x{$_}}) when qq{\\x{$_}} is in latin-1"); is(fc($utf8), lc($utf8), "use locale; fc(qq{\\x{$_}}), lc(qq{\\x{$_}}) when qq{\\x{$_}} is in latin-1"); } diff --git a/t/uni/overload.t b/t/uni/overload.t index 7bf4841..bd87b66 100644 --- a/t/uni/overload.t +++ b/t/uni/overload.t @@ -3,6 +3,7 @@ BEGIN { chdir 't'; @INC = '../lib'; + require Config; import Config; require './test.pl'; } @@ -95,8 +96,10 @@ is ($uc, "\351", "e acute -> E acute"); my $have_setlocale = 0; eval { require POSIX; - import POSIX ':locale_h'; - $have_setlocale++; + if($Config{d_setlocale}) { + import POSIX ':locale_h'; + $have_setlocale++; + } }; SKIP: { @@ -107,7 +110,11 @@ SKIP: { } elsif ($^O eq 'dec_osf' || $^O eq 'VMS') { skip "$^O has broken en_GB.ISO8859-1 locale", 24; } else { - use locale; + BEGIN { + if($Config{d_setlocale}) { + require locale; import locale; + } + } my $u = UTF8Toggle->new("\311"); my $lc = lc $u; is (length $lc, 1); -- 1.7.8.6 ```
p5pRT commented 11 years ago

From @khwilliamson

Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332 -- Karl Williamson

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

@khwilliamson - Status changed from 'open' to 'resolved'

p5pRT commented 11 years ago

From @jkeenan

On Fri Feb 08 05​:17​:44 2013\, JROBINSON wrote​:

This is a bug report for perl from castaway@​desert-island.me.uk\, generated with the help of perlbug 1.39 running under perl 5.17.8.

----------------------------------------------------------------- [Please describe your issue here]

Please find attached a patch which allows Perl's core tests to pass/skip on systems which do not implement the "locale" system\, as expected by "use locale" and "use POSIX '​:local_h'". The patch also makes "use locale" die if $Config{d_setlocale} is not true.

FWIW\, I applied this patch in a branch on my laptop and all tests passed. So\, at the very least\, it does no harm.

This is the case with Android\, which I am targetting in my cross- compiler grant. (It uses ICU instead\, adding support for that is left as a later exercise!)

Feel free to pick up anything useful in the ICU-detection configuration step from Parrot.

Thank you very much. Jim Keenan

p5pRT commented 11 years ago

@jkeenan - Status changed from 'resolved' to 'open'

p5pRT commented 11 years ago

@jkeenan - Status changed from 'open' to 'resolved'

p5pRT commented 11 years ago

From @craigberry

On Sat\, Feb 9\, 2013 at 9​:57 PM\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332

This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.

I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens​:

$ perl -e "use locale; print chr(161) =~ /[[​:print​:]]/ ? qq/Y\n/ : qq/N\n/;" Y $ perl -e "require locale; import locale; print chr(161) =~ /[[​:print​:]]/ ? qq/Y\n/ : qq/N\n/;" N $ perl -e "BEGIN {require locale; import locale;} print chr(161) =~ /[[​:print​:]]/ ? qq/Y\n/ : qq/N\n/;" Y

The test can be patched up by adding a BEGIN block as in the third example above and in some\, but not all\, of the other tests affected by this patch. But I would like to understand why this works the way it does. Is it something about when the regex engine looks at locale settings?

p5pRT commented 11 years ago

From castaway@desert-island.me.uk

On Sun\, 17 Feb 2013\, Craig Berry via RT wrote​:

On Sat\, Feb 9\, 2013 at 9​:57 PM\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332

This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.

Darn\, sorry. I thought I'd wrapped them all in BEGIN blocks\, did I miss one?

I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens​:

It's because locale adds bits to $^H\, and this can only happen at compile time (figured this out the hard way).

Jess

p5pRT commented 11 years ago

From @khwilliamson

On 02/17/2013 08​:20 AM\, Craig A. Berry wrote​:

On Sat\, Feb 9\, 2013 at 9​:57 PM\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332

This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.

I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens​:

$ perl -e "use locale; print chr(161) =~ /[[​:print​:]]/ ? qq/Y\n/ : qq/N\n/;" Y $ perl -e "require locale; import locale; print chr(161) =~ /[[​:print​:]]/ ? qq/Y\n/ : qq/N\n/;" N $ perl -e "BEGIN {require locale; import locale;} print chr(161) =~ /[[​:print​:]]/ ? qq/Y\n/ : qq/N\n/;" Y

The test can be patched up by adding a BEGIN block as in the third example above and in some\, but not all\, of the other tests affected by this patch. But I would like to understand why this works the way it does. Is it something about when the regex engine looks at locale settings?

The patch should have done the require and import at compile time\, and I should have caught that before applying.

It's the way things work\, so I guess you can call it a feature. The regular expression is compiled at compile time\, so if it is to use locale\, that fact must be known at compile time. Normally\, one does 'use locale'\, and so there is no problem.

You can try the attached patch and let me know if that works. I used the 'if' module in it. I asked the patch's author\, Jess\, why she hadn't used 'if'\, and she said in essence that doing so would be adding another potential point of failure\, and that seemed reasonable to me. But I'm thinking that we should discuss that on this list. It makes the code simpler\, and this kind of error would not have happened (if the attached patch works)\, and perhaps we can assume that by the time we get to testing regular expressions\, that we know that 'if' works.

Before looking at the other things you found\, I think we should resolve this.

p5pRT commented 11 years ago

From @khwilliamson

0005-re-charset.t-Make-sure-locale-is-loaded-at-compile-t.patch ```diff From 80d27ac45a2e21a3f03e1d297ab7400232072a02 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 17 Feb 2013 12:27:03 -0700 Subject: [PATCH 5/5] re/charset.t: Make sure locale is loaded at compile time TO BE FURNISHED --- t/re/charset.t | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/t/re/charset.t b/t/re/charset.t index ee3625a..ebbbeb0 100644 --- a/t/re/charset.t +++ b/t/re/charset.t @@ -3,12 +3,12 @@ BEGIN { chdir 't' if -d 't'; @INC = '../lib'; - require Config; import Config; require './test.pl'; } use strict; use warnings; +use Config; plan('no_plan'); @@ -40,7 +40,10 @@ if (! is_miniperl() && $Config{d_setlocale}) { require POSIX; my $current_locale = POSIX::setlocale( &POSIX::LC_ALL, "C") // ""; if ($current_locale eq 'C') { - require locale; import locale; + + # test for d_setlocale is repeated here because this one is compile + # time, and the one above is run time + use if $Config{d_setlocale}, 'locale'; # Some locale implementations don't have the 128-255 characters all # mean nothing. Skip the locale tests in that situation -- 1.8.1.3 ```
p5pRT commented 11 years ago

From @craigberry

On Sun\, Feb 17\, 2013 at 1​:28 PM\, Jess Robinson \castaway@&#8203;desert\-island\.me\.uk wrote​:

On Sun\, 17 Feb 2013\, Craig Berry via RT wrote​:

On Sat\, Feb 9\, 2013 at 9​:57 PM\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332

This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.

Darn\, sorry. I thought I'd wrapped them all in BEGIN blocks\, did I miss one?

Skimming through here​:

\<http​://perl5.git.perl.org/perl.git/commitdiff/569f7fc5d4ec06501b46a72075ff434fe1bf4332>

I see a few in handy.t\, fold_grind.t\, charset.t and even one in locale.t that are not in BEGIN blocks. It may not matter in all cases; it really depends on what operations are being done that depend on locales and when the implementations of those operations choose to look at the hints.

I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens​:

It's because locale adds bits to $^H\, and this can only happen at compile time (figured this out the hard way).

That makes sense. What I find harder to wrap my head around is how to know when the regex engine will check $^H and behave differently based on what it finds. Probably safest just to make sure the hints get set at compile time.

p5pRT commented 11 years ago

From @craigberry

On Sun\, Feb 17\, 2013 at 1​:39 PM\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

The patch should have done the require and import at compile time\, and I should have caught that before applying.

It's the way things work\, so I guess you can call it a feature. The regular expression is compiled at compile time\, so if it is to use locale\, that fact must be known at compile time. Normally\, one does 'use locale'\, and so there is no problem.

You can try the attached patch and let me know if that works. I used the 'if' module in it. I asked the patch's author\, Jess\, why she hadn't used 'if'\, and she said in essence that doing so would be adding another potential point of failure\, and that seemed reasonable to me. But I'm thinking that we should discuss that on this list. It makes the code simpler\, and this kind of error would not have happened (if the attached patch works)\, and perhaps we can assume that by the time we get to testing regular expressions\, that we know that 'if' works.

Thanks\, Karl. The patch you attached does the trick (though note it has CRLF line endings). Interestingly\, this also works​:

Inline Patch ```diff --- t/re/charset.t;-0 2013-02-09 21:56:22 -0600 +++ t/re/charset.t 2013-02-17 14:36:23 -0600 @@ -45,7 +45,7 @@ if (! is_miniperl() && $Config{d_setloca # Some locale implementations don't have the 128-255 characters all # mean nothing. Skip the locale tests in that situation for my $i (128 .. 255) { - goto bad_locale if chr($i) =~ /[[:print:]]/; + goto bad_locale if chr($i) =~ /[[:print:]]/l; } push @charsets, 'l'; bad_locale: ```

[end]

I guess adding /l to the regex causes something to get initialized (or re-initialized) at run-time. But it's still probably safer to make sure locale.pm does its thing at compile time.

Before looking at the other things you found\, I think we should resolve this.

If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken. Here are the details (nothing below 161 matches)​:

$ perl -"Mlocale" -e "for (161..255) {print chr($_) =~ /[[​:print​:]]/ ? qq/$_ Y\n/ : qq/$_ N\n/;}" 161 Y 162 Y 163 Y 164 N 165 Y 166 N 167 Y 168 Y 169 Y 170 Y 171 Y 172 N 173 N 174 N 175 N 176 Y 177 Y 178 Y 179 Y 180 N 181 Y 182 Y 183 Y 184 N 185 Y 186 Y 187 Y 188 Y 189 Y 190 N 191 Y 192 Y 193 Y 194 Y 195 Y 196 Y 197 Y 198 Y 199 Y 200 Y 201 Y 202 Y 203 Y 204 Y 205 Y 206 Y 207 Y 208 N 209 Y 210 Y 211 Y 212 Y 213 Y 214 Y 215 Y 216 Y 217 Y 218 Y 219 Y 220 Y 221 Y 222 N 223 Y 224 Y 225 Y 226 Y 227 Y 228 Y 229 Y 230 Y 231 Y 232 Y 233 Y 234 Y 235 Y 236 Y 237 Y 238 Y 239 Y 240 N 241 Y 242 Y 243 Y 244 Y 245 Y 246 Y 247 Y 248 Y 249 Y 250 Y 251 Y 252 Y 253 Y 254 N 255 N

p5pRT commented 11 years ago

From @craigberry

On Sun\, Feb 17\, 2013 at 2​:51 PM\, Craig A. Berry \craig\.a\.berry@&#8203;gmail\.com wrote​:

On Sun\, Feb 17\, 2013 at 1​:39 PM\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

Before looking at the other things you found\, I think we should resolve this.

If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken.

I read and thought about this a bit more and I now think the characters showing up as printable really are printable in the DEC Multinational Character Set (DEC-MCS) which I think is what's going to be in the default locale on VMS. The standard for locale at​:

http​://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02

just says that the C or POSIX locale governs "data consisting entirely of characters from the portable character set and the control character set. For other characters\, the behavior is unspecified."

So I guess everyone gets to specify what's left differently. Here are the characters above 127 that are considered printable on VMS​:

$ perl -"Mlocale" -e "for (128..255) {print qq/$_\n/ if chr($_) =~ /[[​:print​:]]/};" 161 162 163 165 167 168 169 170 171 176 177 178 179 181 182 183 185 186 187 188 189 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 209 210 211 212 213 214 215 216 217 218 219 220 221 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 241 242 243 244 245 246 247 248 249 250 251 252 253

Eyeballing the chart at​:

http​://en.wikipedia.org/wiki/Multinational_Character_Set

or

http​://www.columbia.edu/kermit/dec-mcs.html

it looks to me like those characters really are printable in DEC-MCS.

So t/re/charset.t is not really correct in flagging anything that has printables above 127 as a "bad locale". It might be a locale that we don't know how to test or for which we would have to maintain a separate set of test data\, but I don't see that it's out of line with the standard.

p5pRT commented 11 years ago

From @khwilliamson

On 02/17/2013 01​:51 PM\, Craig A. Berry wrote​:

On Sun\, Feb 17\, 2013 at 1​:39 PM\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

The patch should have done the require and import at compile time\, and I should have caught that before applying.

It's the way things work\, so I guess you can call it a feature. The regular expression is compiled at compile time\, so if it is to use locale\, that fact must be known at compile time. Normally\, one does 'use locale'\, and so there is no problem.

You can try the attached patch and let me know if that works. I used the 'if' module in it. I asked the patch's author\, Jess\, why she hadn't used 'if'\, and she said in essence that doing so would be adding another potential point of failure\, and that seemed reasonable to me. But I'm thinking that we should discuss that on this list. It makes the code simpler\, and this kind of error would not have happened (if the attached patch works)\, and perhaps we can assume that by the time we get to testing regular expressions\, that we know that 'if' works.

Thanks\, Karl. The patch you attached does the trick (though note it has CRLF line endings). I wonder how that happened.

Interestingly\, this also works​:

--- t/re/charset.t;-0 2013-02-09 21​:56​:22 -0600 +++ t/re/charset.t 2013-02-17 14​:36​:23 -0600 @​@​ -45\,7 +45\,7 @​@​ if (! is_miniperl() && $Config{d_setloca # Some locale implementations don't have the 128-255 characters all # mean nothing. Skip the locale tests in that situation for my $i (128 .. 255) { - goto bad_locale if chr($i) =~ /[[​:print​:]]/; + goto bad_locale if chr($i) =~ /[[​:print​:]]/l; } push @​charsets\, 'l'; bad_locale​: [end]

I guess adding /l to the regex causes something to get initialized (or re-initialized) at run-time. But it's still probably safer to make sure locale.pm does its thing at compile time.

The /l will cause it to compile for locale regardless of whether 'use locale' is in effect or not\, but perlre cautions against using it like this\, for several reasons. The only reason we document it is because Perl can't keep a secret; it should be internal only.

Before looking at the other things you found\, I think we should resolve this.

If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken. Here are the details (nothing below 161 matches)​:

Actually\, I meant the other problems you said were in the patch\, and to which you responded to Jess already.

p5pRT commented 11 years ago

From @khwilliamson

On 02/17/2013 02​:37 PM\, Craig A. Berry wrote​:

On Sun\, Feb 17\, 2013 at 2​:51 PM\, Craig A. Berry \craig\.a\.berry@&#8203;gmail\.com wrote​:

On Sun\, Feb 17\, 2013 at 1​:39 PM\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

Before looking at the other things you found\, I think we should resolve this.

If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken.

I read and thought about this a bit more and I now think the characters showing up as printable really are printable in the DEC Multinational Character Set (DEC-MCS) which I think is what's going to be in the default locale on VMS. The standard for locale at​:

http​://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02

just says that the C or POSIX locale governs "data consisting entirely of characters from the portable character set and the control character set. For other characters\, the behavior is unspecified."

So I guess everyone gets to specify what's left differently.

If I ever knew that\, I had forgotten\, and had been under the misapprehension that the C locale meant that only the ascii characters should be defined. Thanks for setting me straight.

e those characters really are printable in DEC-MCS.

So t/re/charset.t is not really correct in flagging anything that has printables above 127 as a "bad locale". It might be a locale that we don't know how to test or for which we would have to maintain a separate set of test data\, but I don't see that it's out of line with the standard.

You're right. It does mean that it's untestable\, though. I'll change the comments and label to indicate that.

p5pRT commented 11 years ago

From @craigberry

On Sun\, Feb 17\, 2013 at 4​:59 PM\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

On 02/17/2013 02​:37 PM\, Craig A. Berry wrote​:

So t/re/charset.t is not really correct in flagging anything that has printables above 127 as a "bad locale". It might be a locale that we don't know how to test or for which we would have to maintain a separate set of test data\, but I don't see that it's out of line with the standard.

You're right. It does mean that it's untestable\, though. I'll change the comments and label to indicate that.

Thanks. While you're there\, it looks like fold_grind.t has the exact same code to check for a bad/untestable locale.