Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

broken floating-point source constants after setlocale #22176

Open ashutosh108 opened 2 weeks ago

ashutosh108 commented 2 weeks ago

This is a bug report for perl from a.voloshin@postgrespro.ru, generated with the help of perlbug 1.43 running under perl 5.38.0.

Module: POSIX


Description Perl 5.38.x (or, at least, it's POSIX module) seem to break floating-point constants, making 0 < 0.5 false in some locales after setlocale() in BEGIN block.

Verified to work ok: v5.36.3 Verified as buggy: v5.38.0, v5.38.2

Steps to Reproduce If I set LC_NUMERIC=ru_RU.UTF-8 in the environment, this script prints "not ok" under Perl 5.38.x, but "ok" under Perl 5.36.3:

BEGIN {
    use POSIX qw(locale_h);
    setlocale(LC_NUMERIC, '');
}

print "perl: $^V\n";

if (0 < 0.5) {
    print "ok\n";
} else {
    print "not ok\n";
}

Of course, reproduction requires corresponding locale to be present in the system. On my machine, uncommenting corresponding locale name (ru_RU.UTF-8) in /etc/locale.gen and running sudo locale-gen does the trick.

Expected behavior Floating-point constants should work with any locales. Yes, in ru_RU.UTF-8 locale the decimal separator is ",", not ".". But that should not affect parsing the source code itself. Besides, what are we supposed to write instead, 0 < 0,5 ? That would mean something else!

To sum up: I expected 0 < 0.5 to be true in any locale, even after setlocale() in the BEGIN block.

Originally this behaviour was found under Ubuntu 24.04 (which has Perl v5.38.2 shipped as package) with some other module doing "use POSIX / setlocale in the BEGIN block" trick for some other reasons, so it does affect real-world usage.


Flags

Configured by ashutosh at Fri Apr 26 19:37:42 MSK 2024.

Summary of my perl5 (revision 5 version 38 subversion 0) configuration: Commit id: 76298ae68aa7796f0ffc05095b127d23f4b2de8f Platform: osname=linux osvers=6.5.0-26-generic archname=x86_64-linux uname='linux z 6.5.0-26-generic #26~22.04.1-ubuntu smp preempt_dynamic tue mar 12 10:22:43 utc 2 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dprefix=/home/ashutosh/bin/perl5-38-0' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define Compiler: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='11.4.0' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/x86_64-linux-gnu /usr/lib /usr/lib64 libs=-lpthread -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc libc=/lib/x86_64-linux-gnu/libc.so.6 so=so useshrplib=false libperl=libperl.a gnulibc_version='2.35' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


@INC for perl 5.38.0: /home/ashutosh/perl5/lib/perl5/x86_64-linux /home/ashutosh/perl5/lib/perl5 /home/ashutosh/bin/perl5-38-0/lib/site_perl/5.38.0/x86_64-linux /home/ashutosh/bin/perl5-38-0/lib/site_perl/5.38.0 /home/ashutosh/bin/perl5-38-0/lib/5.38.0/x86_64-linux /home/ashutosh/bin/perl5-38-0/lib/5.38.0


Environment for perl 5.38.0: HOME=/home/ashutosh LANG=en_US.UTF-8 LANGUAGE= LC_ADDRESS=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_TIME=en_GB.UTF-8 LD_LIBRARY_PATH=/home/ashutosh/pg/master/ci_install/lib:/usr/local/lib:/opt/icu70.1/lib:/opt/icu69.1/lib:/opt/icu53.1/lib LOGDIR (unset) PATH=/home/ashutosh/pg/master/ci_install/bin:/opt/valgrind/bin:/home/ashutosh/perl5/bin:/home/ashutosh/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/opt/gdb-13/bin:/home/ashutosh/bin:/usr/lib/ccache:/home/ashutosh/.local/bin:/home/ashutosh/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin PERL5LIB=/home/ashutosh/perl5/lib/perl5 PERL_BADLANG (unset) PERL_LOCAL_LIB_ROOT=/home/ashutosh/perl5 PERL_MB_OPT=--install_base "/home/ashutosh/perl5" PERL_MM_OPT=INSTALL_BASE=/home/ashutosh/perl5 SHELL=/bin/bash

khwilliamson commented 2 weeks ago

This would have been a release blocker for 5.38 if we had known about it then. But it has since been fixed, so will be in 5.40. I suspect it may be too complicated for a 5.38 maintenance release. But since its original date is so early after the 5.38 freeze, I could be wrong about that.

Bisecting the fix led to:

commit 5ba25c116c8573b68a6103113d3b831e46f55bee Author: Karl Williamson khw@cpan.org AuthorDate: Tue Apr 11 16:06:11 2023 -0600 Commit: Karl Williamson khw@cpan.org CommitDate: Mon Sep 11 19:58:21 2023 -0600

 Create S_native_querylocale_i() and use it

 This new function differs from the already-existing plain
 querylocale_i() in that it returns in the platform's native format,
 instead of the internal=to-perl one.

 The internal one is used generally so that code doesn't have to cope
 with multiple possible formats.  However, the format of the new locale
 in Perl_setlocale() is going to be in native format.  We effectively
 translate it into our internal one at the input edge, and that is used
 thereafter.

 But until this commit, the translation back to native format at the
 output edge was incomplete.

 This mostly worked because native format differs from locale.c
 internal format in just two ways:

 One is the locale for LC_NUMERIC.  perl keeps it generally in the C
 locale, except for brief intervals which higher level code specifies,
 when the real locale is swapped in.  (Actually, this isn't quite true.
 If the real locale is indistinguishable from C as far as LC_NUMERIC
 goes, perl is happy to use it rather than C, so as to save swapping.)
 locale.c had the code in it to translate the internal format back to
 native, so it worked for this case.

 The other is LC_ALL when not all categories are set to the same locale.
 Windows and Linux use 'name=value;' pairs notation, while things derived
 from BSD (and others) use a positional notation in which only the values
 are given, and the system knows which category a given value is for from
 its position in the string.  Perl worked fine for the name=value pairs
 notation, because that is the same as its internal one, so no
 translation got done, but until this commit, there were issues on
 positional platforms.  This seldom got in the way since most people, if
 they set the locale at all, will just set LC_ALL to some single 'foo'.

 What this commit effectively does is change Perl_setlocale() to return
 the value in the native format which the libc functions are expecting.
 This differs from what it used to return only on platforms which use the
 positional notation and only for LC_ALL when not all categories are set
 to the same locale.

 The new function subsumes much of the work previously done in
 Perl_setlocale(), and it is able to simplify some of that work.
jkeenan commented 2 weeks ago

I confirmed the problem on FreeBSD-13, and also confirmed that the problem is fixed in blead.

[perlmonger: p5p] $ perl gh-22176-locale-20240426.pl
perl: v5.32.1
ok

[perlmonger: v5.38.0] $ ./bin/perl -Ilib $P5P_DIR/gh-22176-locale-20240426.pl
perl: v5.38.0
not ok

[perlmonger: p5p] $ bleadperl gh-22176-locale-20240426.pl
perl: v5.39.10
ok

I checked out tag v5.38.2, created a branch and attempted to cherry-pick the commit @khwilliamson identified. I don't know how to interpret the result.

$ git cherry-pick 76298ae68aa7796f0ffc05095b127d23f4b2de8f
On branch gh-22176-attempt-20240426
You are currently cherry-picking commit 76298ae68a.
  (all conflicts fixed: run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

nothing to commit, working tree clean
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:

    git commit --allow-empty

Otherwise, please use 'git cherry-pick --skip'
mauke commented 2 weeks ago

@jkeenan For cherry-picking, I get:

$ git cherry-pick 5ba25c116c8573b68a6103113d3b831e46f55bee
Auto-merging embed.fnc
CONFLICT (content): Merge conflict in embed.fnc
Auto-merging embed.h
CONFLICT (content): Merge conflict in embed.h
Auto-merging locale.c
CONFLICT (content): Merge conflict in locale.c
Auto-merging perl.h
CONFLICT (content): Merge conflict in perl.h
Auto-merging proto.h
error: could not apply 5ba25c116c... Create S_native_querylocale_i() and use it
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
Recorded preimage for 'embed.fnc'
Recorded preimage for 'embed.h'
Recorded preimage for 'locale.c'
Recorded preimage for 'perl.h'

Which tells me the commit can't be cherry-picked in isolation.

I see that you are using a different commit hash (76298ae68aa7796f0ffc05095b127d23f4b2de8f vs. 5ba25c116c8573b68a6103113d3b831e46f55bee). Not sure what's up with that.

jkeenan commented 2 weeks ago

I see that you are using a different commit hash (76298ae vs. 5ba25c1). Not sure what's up with that.

Some error on my part. 76298ae was v5.38.0.

ashutosh108 commented 2 weeks ago

My bisect led me to a different commit:

commit eacceb3846b76a5b595ac9da9d68cd7858396c51 Author: Karl Williamson khw@cpan.org Date: Sun Dec 31 08:00:53 2023 locale.c: Be sure to toggle into dot radix locale

Just in case, here is the bisect log: gh-22176-bisect-log.txt

Fortunately, this commit cherry-picks into v5.38.2 (==maint-5.38 currently) cleanly. It doesn't compile as it is, but it does after this obvious patch (needed because d65d6e9bb have changed signature of set_numeric_locale in blead):

diff --git a/locale.c b/locale.c
index eacffab38f..b7e502b134 100644
--- a/locale.c
+++ b/locale.c
@@ -1920,7 +1920,7 @@ S_new_numeric(pTHX_ const char *newnum, bool force)
      * locale is indistinguishable from the C locale. */
     if (! force && strEQ(PL_numeric_name, newnum)) {
         if (! PL_numeric_underlying_is_standard) {
-            set_numeric_standard(__FILE__, __LINE__);
+            set_numeric_standard();
         }

         return;

The fixed v5.38.2+eacceb384 version works for me, the script from the first message prints "ok".

Could someone who knows Perl internals please verify that this kind of fix is acceptable to be included into Perl 5.38.x?

Upd: created a pull request https://github.com/Perl/perl5/pull/22185 with the fix described above.

ashutosh108 commented 2 weeks ago

Is there anything I could do for this to be fixed sooner? I'm new to Perl community and I don't know how the things are generally done here. If I need to squash commits or to clarify the commit message in PR #22185, please let me know. Or is there a mailing list to discuss this issue and review the fix for it?

In the mean time, I'm waiting for this bug to be fixed in 5.38.x because that version is going to be quite well-spread due to e.g. being present in the (fresh) current Ubuntu LTS 24.04.

khwilliamson commented 2 weeks ago

@steve-m-hay It looks like we need to release a 5.38 maint