Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

Solaris Failing Some Locale Tests #16537

Closed p5pRT closed 6 years ago

p5pRT commented 6 years ago

Migrated from rt.perl.org#133157 (status was 'resolved')

Searchable as RT133157$

p5pRT commented 6 years ago

From carlos@carlosguevara.com

This is a bug report for perl from "Carlos Guevara" \carlos@​carlosguevara\.com\, generated with the help of perlbug 1.41 running under perl 5.28.0.


Solaris is failing some locale tests​: http​://perl5.test-smoke.org/report/65421



Flags​:   category=core   severity=low


Site configuration information for perl 5.28.0​:

Configured by cpan at Thu Apr 26 22​:26​:15 CDT 2018.

Summary of my perl5 (revision 5 version 28 subversion 0) configuration​:   Snapshot of​: 5dbe8f0a915c25666dd9c760775f619c34a51538   Platform​:   osname=solaris   osvers=2.11   archname=i86pc-solaris-64   uname='sunos cjg-hipster 5.11 illumos-094e47e980 i86pc i386 i86pc '   config_args='-des -Dprefix=~/bin/perl-blead -Dscriptdir=~/bin/perl-blead/bin -Dusedevel -Duse64bitall -Dcc=gcc'   hint=recommended   useposix=true   d_sigaction=define   useithreads=undef   usemultiplicity=undef   use64bitint=define   use64bitall=define   uselongdouble=undef   usemymalloc=n   default_inc_excludes_dot=define   bincompat5005=undef   Compiler​:   cc='gcc'   ccflags ='-m64 -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/gnu/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -DPERL_USE_SAFE_PUTENV'   optimize='-O'   cppflags='-m64 -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/gnu/include'   ccversion=''   gccversion='6.4.0'   gccosandvers=''   intsize=4   longsize=8   ptrsize=8   doublesize=8   byteorder=12345678   doublekind=3   d_longlong=define   longlongsize=8   d_longdbl=define   longdblsize=16   longdblkind=3   ivtype='long'   ivsize=8   nvtype='double'   nvsize=8   Off_t='off_t'   lseeksize=8   alignbytes=8   prototype=define   Linker and Libraries​:   ld='gcc'   ldflags =' -m64 -fstack-protector-strong -L/usr/gnu/lib '   libpth=/usr/gcc/6/lib /usr/lib /usr/gnu/lib /usr/ccs/lib   libs=-lpthread -lsocket -lnsl -lgdbm -ldb -ldl -lm -lc   perllibs=-lpthread -lsocket -lnsl -ldl -lm -lc   libc=/lib/libc.so   so=so   useshrplib=true   libperl=libperl.so   gnulibc_version=''   Dynamic Linking​:   dlsrc=dl_dlopen.xs   dlext=so   d_dlsymun=undef   ccdlflags=' -R /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64/CORE'   cccdlflags='-fPIC'   lddlflags=' -shared -m64 -L/usr/gnu/lib -fstack-protector-strong'


@​INC for perl 5.28.0​:   /home/cpan/bin/perl-blead/lib/site_perl/5.28.0/i86pc-solaris-64   /home/cpan/bin/perl-blead/lib/site_perl/5.28.0   /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64   /home/cpan/bin/perl-blead/lib/5.28.0


Environment for perl 5.28.0​:   HOME=/home/cpan   LANG=en_US   LANGUAGE (unset)   LC_ALL=C   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/home/cpan/bin/perl-blead/bin​:/home/cpan/bin​:/usr/bin​:/usr/sbin​:/sbin​:/usr/gnu/bin   PERL_BADLANG (unset)   SHELL=/usr/bin/bash

p5pRT commented 6 years ago

From @khwilliamson

On 04/26/2018 10​:07 PM\, Carlos Guevara (via RT) wrote​:

# New Ticket Created by Carlos Guevara # Please include the string​: [perl #133157] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133157 >

This is a bug report for perl from "Carlos Guevara" \carlos@&#8203;carlosguevara\.com\, generated with the help of perlbug 1.41 running under perl 5.28.0.

----------------------------------------------------------------- Solaris is failing some locale tests​: http​://perl5.test-smoke.org/report/65421

The problem is that these locales have a UTF-8 decimal radix character\, and it appears that the OS doesn't properly handle this case. A very similar issue was present in cygwin until we reported it to them\, and they have since fixed it. Attached is a short C program to verify that it's an OS problem.

In the meantime\, solaris smokes are failing. I've made this a 5.28 blocker. I have patches that skip or todo the failing tests. But I'll wait until Carlos runs the program.

Also\, this is openindiana solaris. I have no idea if Oracle solaris has this issue. The bug tracker is not open to the public\, which I find astonishing and disconcerting. I did not find this issue in the openindiana list-----------------------------------------------------------------

--- Flags​: category=core severity=low --- Site configuration information for perl 5.28.0​:

Configured by cpan at Thu Apr 26 22​:26​:15 CDT 2018.

Summary of my perl5 (revision 5 version 28 subversion 0) configuration​: Snapshot of​: 5dbe8f0a915c25666dd9c760775f619c34a51538 Platform​: osname=solaris osvers=2.11 archname=i86pc-solaris-64 uname='sunos cjg-hipster 5.11 illumos-094e47e980 i86pc i386 i86pc ' config_args='-des -Dprefix=~/bin/perl-blead -Dscriptdir=~/bin/perl-blead/bin -Dusedevel -Duse64bitall -Dcc=gcc' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler​: cc='gcc' ccflags ='-m64 -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/gnu/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -DPERL_USE_SAFE_PUTENV' optimize='-O' cppflags='-m64 -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/gnu/include' ccversion='' gccversion='6.4.0' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries​: ld='gcc' ldflags =' -m64 -fstack-protector-strong -L/usr/gnu/lib ' libpth=/usr/gcc/6/lib /usr/lib /usr/gnu/lib /usr/ccs/lib libs=-lpthread -lsocket -lnsl -lgdbm -ldb -ldl -lm -lc perllibs=-lpthread -lsocket -lnsl -ldl -lm -lc libc=/lib/libc.so so=so useshrplib=true libperl=libperl.so gnulibc_version='' Dynamic Linking​: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags=' -R /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64/CORE' cccdlflags='-fPIC' lddlflags=' -shared -m64 -L/usr/gnu/lib -fstack-protector-strong'

--- @​INC for perl 5.28.0​: /home/cpan/bin/perl-blead/lib/site_perl/5.28.0/i86pc-solaris-64 /home/cpan/bin/perl-blead/lib/site_perl/5.28.0 /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64 /home/cpan/bin/perl-blead/lib/5.28.0

--- Environment for perl 5.28.0​: HOME=/home/cpan LANG=en_US LANGUAGE (unset) LC_ALL=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/cpan/bin/perl-blead/bin​:/home/cpan/bin​:/usr/bin​:/usr/sbin​:/sbin​:/usr/gnu/bin PERL_BADLANG (unset) SHELL=/usr/bin/bash

p5pRT commented 6 years ago

From @khwilliamson

#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
    char buf[100];
    unsigned int i;

    printf("%s\n", setlocale(LC_ALL, "ar_AE.UTF8"));
    snprintf(buf, sizeof(buf), "%g", 3.2);

    for (i = 0; i < sizeof(buf); i++) {
        if (buf[i] == '\0') break;
        printf(" %x", buf[i]);
    }
    printf("\n");
}
p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 6 years ago

From carlos@carlosguevara.com

Revised radix.c​: ##### #include \<stdio.h> #include \<locale.h>

int main(int argc\, char ** argv) {   unsigned char buf[100];   unsigned int i;

  printf("%s\n"\, setlocale(LC_ALL\, "ar_AE.UTF-8"));   snprintf(buf\, sizeof(buf)\, "%g"\, 3.2);

  for (i = 0; i \< sizeof(buf); i++) {   if (buf[i] == '\0') break;   printf(" %x"\, buf[i]);   }   printf("\n"); } #####

Output​: ##### ar_AE.UTF-8 33 d9 32 #####

p5pRT commented 6 years ago

From @khwilliamson

On 04/27/2018 09​:56 PM\, Carlos Guevara wrote​:

Revised radix.c​: ##### #include \<stdio.h> #include \<locale.h>

int main(int argc\, char ** argv) { unsigned char buf[100]; unsigned int i;

 printf\("%s\\n"\, setlocale\(LC\_ALL\, "ar\_AE\.UTF\-8"\)\);
 snprintf\(buf\, sizeof\(buf\)\, "%g"\, 3\.2\);

 for \(i = 0; i \< sizeof\(buf\); i\+\+\) \{
     if \(buf\[i\] == '\\0'\) break;
     printf\(" %x"\, buf\[i\]\);
 \}
 printf\("\\n"\);

} #####

Output​: ##### ar_AE.UTF-8 33 d9 32 #####

That should instead have been 33 d9 ab 32. And that indicates that the problem is indeed with the OS. My guess is that it doesn't consider the possibility of a multi-byte radix character\, so it uses just the first byte\, but \xd9 is a start byte of a two byte sequence\, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

p5pRT commented 6 years ago

From @khwilliamson

On 04/27/2018 10​:24 PM\, Karl Williamson wrote​:

On 04/27/2018 09​:56 PM\, Carlos Guevara wrote​:

Revised radix.c​: ##### #include \<stdio.h> #include \<locale.h>

int main(int argc\, char ** argv) {      unsigned char buf[100];      unsigned int i;

     printf("%s\n"\, setlocale(LC_ALL\, "ar_AE.UTF-8"));      snprintf(buf\, sizeof(buf)\, "%g"\, 3.2);

     for (i = 0; i \< sizeof(buf); i++) {          if (buf[i] == '\0') break;          printf(" %x"\, buf[i]);      }      printf("\n"); } #####

Output​: ##### ar_AE.UTF-8   33 d9 32 #####

That should instead have been 33 d9 ab 32. And that indicates that the problem is indeed with the OS. My guess is that it doesn't consider the possibility of a multi-byte radix character\, so it uses just the first byte\, but \xd9 is a start byte of a two byte sequence\, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

Now done as https://www.illumos.org/issues/9511

p5pRT commented 6 years ago

From @khwilliamson

On 04/28/2018 10​:02 AM\, Karl Williamson wrote​:

On 04/27/2018 10​:24 PM\, Karl Williamson wrote​:

On 04/27/2018 09​:56 PM\, Carlos Guevara wrote​:

Revised radix.c​: ##### #include \<stdio.h> #include \<locale.h>

int main(int argc\, char ** argv) {      unsigned char buf[100];      unsigned int i;

     printf("%s\n"\, setlocale(LC_ALL\, "ar_AE.UTF-8"));      snprintf(buf\, sizeof(buf)\, "%g"\, 3.2);

     for (i = 0; i \< sizeof(buf); i++) {          if (buf[i] == '\0') break;          printf(" %x"\, buf[i]);      }      printf("\n"); } #####

Output​: ##### ar_AE.UTF-8   33 d9 32 #####

That should instead have been 33 d9 ab 32. And that indicates that the problem is indeed with the OS. My guess is that it doesn't consider the possibility of a multi-byte radix character\, so it uses just the first byte\, but \xd9 is a start byte of a two byte sequence\, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

Now done as https://www.illumos.org/issues/9511

Attached are three patches that cause these tests to pass on solaris. A version specification should probably be added to the one for t/run/locale.t. But there are complications that I don't know how to deal with. I don't know the version spec to use for openindiana which I understand has a different kind of release deal. And I don't know if this is a bug in the Oracle solaris\, which has a very different version number.

I think these patches\, after the versioning is ironed out\, should go in 5.28\, so that this platform passes the test suite. These affect only two .t files.

p5pRT commented 6 years ago

From @khwilliamson

0002-t-run-locale.t-Skip-some-Solaris-locales.patch ```diff From bd0d1ba4062ea201cb26e4d5690e76f00a3f9287 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 19 Apr 2018 14:43:43 -0600 Subject: [PATCH 2/4] t/run/locale.t: Skip some Solaris locales Solaris is buggy in dealing with locales that have a multi-byte UTF-8 decimal radix character. Skip using these, like we do on cygwin, which has a similar problem. --- t/run/locale.t | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/t/run/locale.t b/t/run/locale.t index 13bc25d7a7..282fbb5f86 100644 --- a/t/run/locale.t +++ b/t/run/locale.t @@ -88,6 +88,13 @@ if ($non_C_locale) { @test_numeric_locales = grep { $_ !~ m/ps_AF/i } @test_numeric_locales; } + # Similarly the arabic locales on solaris don't work right on the + # multi-byte radix character, generating malformed UTF-8. + if ($^O eq 'solaris') { + @test_numeric_locales = grep { $_ !~ m/ ^ ( ar_ | pa_ ) /x } + @test_numeric_locales; + } + fresh_perl_is("for (qw(@test_numeric_locales)) {\n" . <<'EOF', use POSIX qw(locale_h); use locale; -- 2.11.0 ```
p5pRT commented 6 years ago

From @khwilliamson

0003-lib-locale.t-Mark-a-test-problematic.patch ```diff From 57e2dd7f14b426e28eab0b11640ba1b921daf080 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 28 Apr 2018 10:16:08 -0600 Subject: [PATCH 3/4] lib/locale.t: Mark a test problematic We now have found a system that fails this test. Tests that are listed as problematic automatically get marked as TODO when they fail with specified platforms. The next commit will specify the platform that this is fails on. --- lib/locale.t | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/locale.t b/lib/locale.t index 85843acae7..638e21cff0 100644 --- a/lib/locale.t +++ b/lib/locale.t @@ -2237,6 +2237,7 @@ foreach my $Locale (@Locale) { report_result($Locale, ++$locales_test_number, $ok15); $test_names{$locales_test_number} = 'Verify that a number with a UTF-8 radix has a UTF-8 stringification'; + $problematical_tests{$locales_test_number} = 1; report_result($Locale, ++$locales_test_number, $ok16); $test_names{$locales_test_number} = 'Verify that a sprintf of a number with a UTF-8 radix yields UTF-8'; -- 2.11.0 ```
p5pRT commented 6 years ago

From @khwilliamson

0004-lib-locale.t-TODO-some-locales-on-Solaris.patch ```diff From 54749c361a30cfad35542ed0841956477ae3fa32 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 28 Apr 2018 10:18:05 -0600 Subject: [PATCH 4/4] lib/locale.t: TODO some locales on Solaris There is a bug in Solaris with locales which have a multi-byte decimal radix character. Make these TODO, like we do cygwin, which has had a similar problem. --- lib/locale.t | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/locale.t b/lib/locale.t index 638e21cff0..17931c894d 100644 --- a/lib/locale.t +++ b/lib/locale.t @@ -78,6 +78,11 @@ my %known_bad_locales = ( darwin => qr/ ^ lt_LT.ISO8859 /ix, os390 => qr/ ^ italian /ix, netbsd => qr/\bISO8859-2\b/i, + + # This may be the same bug as the cygwin below; it's + # generating malformed UTF-8 on the radix being + # mulit-byte + solaris => qr/ ^ ( ar_ | pa_ ) /x, ); # cygwin isn't returning proper radix length in this locale, but supposedly to -- 2.11.0 ```
p5pRT commented 6 years ago

From @xsawyerx

On 04/28/2018 07​:24 PM\, Karl Williamson wrote​:

On 04/28/2018 10​:02 AM\, Karl Williamson wrote​:

On 04/27/2018 10​:24 PM\, Karl Williamson wrote​:

On 04/27/2018 09​:56 PM\, Carlos Guevara wrote​:

Revised radix.c​: ##### #include \<stdio.h> #include \<locale.h>

int main(int argc\, char ** argv) {      unsigned char buf[100];      unsigned int i;

     printf("%s\n"\, setlocale(LC_ALL\, "ar_AE.UTF-8"));      snprintf(buf\, sizeof(buf)\, "%g"\, 3.2);

     for (i = 0; i \< sizeof(buf); i++) {          if (buf[i] == '\0') break;          printf(" %x"\, buf[i]);      }      printf("\n"); } #####

Output​: ##### ar_AE.UTF-8   33 d9 32 #####

That should instead have been 33 d9 ab 32. And that indicates that the problem is indeed with the OS. My guess is that it doesn't consider the possibility of a multi-byte radix character\, so it uses just the first byte\, but \xd9 is a start byte of a two byte sequence\, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

Now done as https://www.illumos.org/issues/9511

Attached are three patches that cause these tests to pass on solaris.  A version specification should probably be added to the one for t/run/locale.t.  But there are complications that I don't know how to deal with.  I don't know the version spec to use for openindiana which I understand has a different kind of release deal.  And I don't know if this is a bug in the Oracle solaris\, which has a very different version number.

I think these patches\, after the versioning is ironed out\, should go in 5.28\, so that this platform passes the test suite.  These affect only two .t files.

I'd like one of the committers to approve this before it is merged to blead. Dave\, Tony\, Yves\, Zefram\, etc.?

p5pRT commented 6 years ago

From @iabyn

On Mon\, Apr 30\, 2018 at 11​:50​:45PM +0300\, Sawyer X wrote​:

On 04/28/2018 07​:24 PM\, Karl Williamson wrote​:

Attached are three patches that cause these tests to pass on solaris.  A version specification should probably be added to the one for t/run/locale.t.  But there are complications that I don't know how to deal with.  I don't know the version spec to use for openindiana which I understand has a different kind of release deal.  And I don't know if this is a bug in the Oracle solaris\, which has a very different version number.

I think these patches\, after the versioning is ironed out\, should go in 5.28\, so that this platform passes the test suite.  These affect only two .t files.

I'd like one of the committers to approve this before it is merged to blead. Dave\, Tony\, Yves\, Zefram\, etc.?

I approve\, and and just merged them\, as

  v5.27.11-26-ge3e8c0d65c   v5.27.11-27-ga6bc52d6f4   v5.27.11-28-gb974d2c0b3

As regards the specifics of openindiana id and versions\, that can always be added later if we obtain that info.

-- Modern art​:   "That's easy\, I could have done that!"   "Ah\, but you didn't!"

p5pRT commented 6 years ago

@iabyn - Status changed from 'open' to 'resolved'