Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

Don't output msg for harmless use of unsupported locale #22160

Closed khwilliamson closed 2 weeks ago

khwilliamson commented 3 weeks ago

This fixes GH #21562

Perl doesn't support all possible locales. Locales that remap elements of the ASCII character set or change their case pairs won't work fully, for example. Hence, some Turkish locales arent supported because Turkish has different behavior in regard to 'I' and 'i' than other locales that use the Latin alphabet.

The only multi-byte locales that perl supports are UTF-8 ones (and there actually is special handling here to support Turkish). Other multi-byte locales can be dangerous to use, possibly crashing or hanging the Perl interpreter. Locales with shift states are particularly prone to this.

Since perl is written in C, there is always an underlying locale. But most C functions don't look at locales at all, and the Perl interpreter takes care to call the ones that do only within the scope of 'use locale' or for certain function calls in the POSIX:: module that always use the program's current underlying locale.

Prior to this commit, if a dangerous locale underlied the program at startup, a warning to that effect was emitted, even if that locale never gets accessed.

This commit changes things so that no warning is output until and if the dangerous underlying locale is actually attempted to be used.

Pre-existing code also deferred warnings about locales (like the Turkish ones mentioned above) that aren't fully compatible with perl. So it was a simple matter to just modify this code a bit, and add some extra checks for sane locales being in effect

sisyphus commented 3 weeks ago

I've just tested this on an MSWin32-x64-multi-thread (UCRT) build from source that also included #22157. All tests passed. With an earlier, identically configured build of perl-5.39.9 source, I get the following:

perl -MPOSIX -Mwarnings -le "$loc = POSIX::setlocale( LC_ALL, 'Korean_Korea.949' ); print $loc;"
Locale 'Korean_Korea.949' is unsupported, and may crash the interpreter at -e line 1.

That's the same as I get using Strawberry Perl 5.38.2 - except that 5.38.2 does not include the "at -e line 1" information.

With this latest build, the output of that one-liner has changed:

perl -MPOSIX -Mwarnings -le "$loc = POSIX::setlocale( LC_ALL, 'Korean_Korea.949' ); print $loc;"
Korean_Korea.949

The warning has gone, though I wonder if that's correct, given that the one-liner still specifies Korean_Korea.949 ? I don't know much about locale usage, and if there's some more helpful code I could run, then I'm happy to test it.

I did notice that, with this latest build, inserting -Mlocale into that one-liner resulted in the reappearance of the warning, but insertion of -Mlocale=":not_characters" did not:

perl -MPOSIX -Mwarnings -Mlocale -le "$loc = POSIX::setlocale( LC_ALL, 'Korean_Korea.949' ); print $loc;"
Locale 'Korean_Korea.949' is unsupported, and may hang or crash the interpreter at -e line 1.
Korean_Korea.949

perl -MPOSIX -Mwarnings -Mlocale=":not_characters" -le "$loc = POSIX::setlocale( LC_ALL, 'Korean_Korea.949' ); print $loc;"
Korean_Korea.949

I don't know if there's anything helpful/meaningful in those last 2 one-liners. Some feedback from those who are actually wanting to use these problematic locales would, I'm sure, be most helpful.

tonycoz commented 3 weeks ago

I'm getting reasonable results on a system with the system language set to Japanese (in the admin panel) and the current (in settings Preferred Languages) also set to Japanese.

Before the change I'd see the warning with just perl -v.

After the change I only see the warning when doing something where locale matters: image

sisyphus commented 3 weeks ago

I'm keen to not mess with my system locale at all, but I think I'm seeing consistency with what @tonycoz has presented:

D:\>perl -Mwarnings -MPOSIX -le "$l=POSIX::setlocale( LC_ALL, 'Japanese_Japan.932' ); print $l;"
Japanese_Japan.932

D:\>perl -Mwarnings -MPOSIX -le "$l=POSIX::setlocale( LC_ALL, 'Japanese_Japan.932' ); print $l; print mblen('ABC');"
Japanese_Japan.932
Locale 'Japanese_Japan.932' is unsupported, and may hang or crash the interpreter at -e line 1.
1

(In my earlier post I failed to recognize that printing out the locale was not the same as actually using the locale.)

khwilliamson commented 3 weeks ago

Note that with this approach, XS code that calls libc functions directly may not get the warning.