Closed p5pRT closed 11 years ago
Please find attached a patch which allows Perl's core tests to pass/skip
on systems which do not implement the "locale" system\, as expected by "use
locale" and "use POSIX ':local_h'". The patch also makes "use locale" die
if $Config{d_setlocale} is not true.
This is the case with Android\, which I am targetting in my cross-compiler
grant. (It uses ICU instead\, adding support for that is left as a later
exercise!)
Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332 -- Karl Williamson
The RT System itself - Status changed from 'new' to 'open'
@khwilliamson - Status changed from 'open' to 'resolved'
On Fri Feb 08 05:17:44 2013\, JROBINSON wrote:
This is a bug report for perl from castaway@desert-island.me.uk\, generated with the help of perlbug 1.39 running under perl 5.17.8.
----------------------------------------------------------------- [Please describe your issue here]
Please find attached a patch which allows Perl's core tests to pass/skip on systems which do not implement the "locale" system\, as expected by "use locale" and "use POSIX ':local_h'". The patch also makes "use locale" die if $Config{d_setlocale} is not true.
FWIW\, I applied this patch in a branch on my laptop and all tests passed. So\, at the very least\, it does no harm.
This is the case with Android\, which I am targetting in my cross- compiler grant. (It uses ICU instead\, adding support for that is left as a later exercise!)
Feel free to pick up anything useful in the ICU-detection configuration step from Parrot.
Thank you very much. Jim Keenan
@jkeenan - Status changed from 'resolved' to 'open'
@jkeenan - Status changed from 'open' to 'resolved'
On Sat\, Feb 9\, 2013 at 9:57 PM\, Karl Williamson via RT \perlbug\-followup@​perl\.org wrote:
Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332
This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.
I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens:
$ perl -e "use locale; print chr(161) =~ /[[:print:]]/ ? qq/Y\n/ : qq/N\n/;" Y $ perl -e "require locale; import locale; print chr(161) =~ /[[:print:]]/ ? qq/Y\n/ : qq/N\n/;" N $ perl -e "BEGIN {require locale; import locale;} print chr(161) =~ /[[:print:]]/ ? qq/Y\n/ : qq/N\n/;" Y
The test can be patched up by adding a BEGIN block as in the third example above and in some\, but not all\, of the other tests affected by this patch. But I would like to understand why this works the way it does. Is it something about when the regex engine looks at locale settings?
On Sun\, 17 Feb 2013\, Craig Berry via RT wrote:
On Sat\, Feb 9\, 2013 at 9:57 PM\, Karl Williamson via RT \perlbug\-followup@​perl\.org wrote:
Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332
This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.
Darn\, sorry. I thought I'd wrapped them all in BEGIN blocks\, did I miss one?
I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens:
It's because locale adds bits to $^H\, and this can only happen at compile time (figured this out the hard way).
Jess
On 02/17/2013 08:20 AM\, Craig A. Berry wrote:
On Sat\, Feb 9\, 2013 at 9:57 PM\, Karl Williamson via RT \perlbug\-followup@​perl\.org wrote:
Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332
This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.
I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens:
$ perl -e "use locale; print chr(161) =~ /[[:print:]]/ ? qq/Y\n/ : qq/N\n/;" Y $ perl -e "require locale; import locale; print chr(161) =~ /[[:print:]]/ ? qq/Y\n/ : qq/N\n/;" N $ perl -e "BEGIN {require locale; import locale;} print chr(161) =~ /[[:print:]]/ ? qq/Y\n/ : qq/N\n/;" Y
The test can be patched up by adding a BEGIN block as in the third example above and in some\, but not all\, of the other tests affected by this patch. But I would like to understand why this works the way it does. Is it something about when the regex engine looks at locale settings?
The patch should have done the require and import at compile time\, and I should have caught that before applying.
It's the way things work\, so I guess you can call it a feature. The regular expression is compiled at compile time\, so if it is to use locale\, that fact must be known at compile time. Normally\, one does 'use locale'\, and so there is no problem.
You can try the attached patch and let me know if that works. I used the 'if' module in it. I asked the patch's author\, Jess\, why she hadn't used 'if'\, and she said in essence that doing so would be adding another potential point of failure\, and that seemed reasonable to me. But I'm thinking that we should discuss that on this list. It makes the code simpler\, and this kind of error would not have happened (if the attached patch works)\, and perhaps we can assume that by the time we get to testing regular expressions\, that we know that 'if' works.
Before looking at the other things you found\, I think we should resolve this.
On Sun\, Feb 17\, 2013 at 1:28 PM\, Jess Robinson \castaway@​desert\-island\.me\.uk wrote:
On Sun\, 17 Feb 2013\, Craig Berry via RT wrote:
On Sat\, Feb 9\, 2013 at 9:57 PM\, Karl Williamson via RT \perlbug\-followup@​perl\.org wrote:
Thanks\, applied as 569f7fc5d4ec06501b46a72075ff434fe1bf4332
This caused t/re/charset.t to start failing on VMS. What it came down to is that we were skipping the locale tests but now are not\, and the check to see whether we should skip them broke because C\<use locale;> was replaced with C\<require locale; import locale;>. Without getting the locale module loaded at compile time\, we're too late to have it influence regex matching.
Darn\, sorry. I thought I'd wrapped them all in BEGIN blocks\, did I miss one?
Skimming through here:
\<http://perl5.git.perl.org/perl.git/commitdiff/569f7fc5d4ec06501b46a72075ff434fe1bf4332>
I see a few in handy.t\, fold_grind.t\, charset.t and even one in locale.t that are not in BEGIN blocks. It may not matter in all cases; it really depends on what operations are being done that depend on locales and when the implementations of those operations choose to look at the hints.
I'm not sure if that's a bug in or feature of the locale module (or of the regex engine)\, but here's an illustration of what happens:
It's because locale adds bits to $^H\, and this can only happen at compile time (figured this out the hard way).
That makes sense. What I find harder to wrap my head around is how to know when the regex engine will check $^H and behave differently based on what it finds. Probably safest just to make sure the hints get set at compile time.
On Sun\, Feb 17\, 2013 at 1:39 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
The patch should have done the require and import at compile time\, and I should have caught that before applying.
It's the way things work\, so I guess you can call it a feature. The regular expression is compiled at compile time\, so if it is to use locale\, that fact must be known at compile time. Normally\, one does 'use locale'\, and so there is no problem.
You can try the attached patch and let me know if that works. I used the 'if' module in it. I asked the patch's author\, Jess\, why she hadn't used 'if'\, and she said in essence that doing so would be adding another potential point of failure\, and that seemed reasonable to me. But I'm thinking that we should discuss that on this list. It makes the code simpler\, and this kind of error would not have happened (if the attached patch works)\, and perhaps we can assume that by the time we get to testing regular expressions\, that we know that 'if' works.
Thanks\, Karl. The patch you attached does the trick (though note it has CRLF line endings). Interestingly\, this also works:
[end]
I guess adding /l to the regex causes something to get initialized (or re-initialized) at run-time. But it's still probably safer to make sure locale.pm does its thing at compile time.
Before looking at the other things you found\, I think we should resolve this.
If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken. Here are the details (nothing below 161 matches):
$ perl -"Mlocale" -e "for (161..255) {print chr($_) =~ /[[:print:]]/ ? qq/$_ Y\n/ : qq/$_ N\n/;}" 161 Y 162 Y 163 Y 164 N 165 Y 166 N 167 Y 168 Y 169 Y 170 Y 171 Y 172 N 173 N 174 N 175 N 176 Y 177 Y 178 Y 179 Y 180 N 181 Y 182 Y 183 Y 184 N 185 Y 186 Y 187 Y 188 Y 189 Y 190 N 191 Y 192 Y 193 Y 194 Y 195 Y 196 Y 197 Y 198 Y 199 Y 200 Y 201 Y 202 Y 203 Y 204 Y 205 Y 206 Y 207 Y 208 N 209 Y 210 Y 211 Y 212 Y 213 Y 214 Y 215 Y 216 Y 217 Y 218 Y 219 Y 220 Y 221 Y 222 N 223 Y 224 Y 225 Y 226 Y 227 Y 228 Y 229 Y 230 Y 231 Y 232 Y 233 Y 234 Y 235 Y 236 Y 237 Y 238 Y 239 Y 240 N 241 Y 242 Y 243 Y 244 Y 245 Y 246 Y 247 Y 248 Y 249 Y 250 Y 251 Y 252 Y 253 Y 254 N 255 N
On Sun\, Feb 17\, 2013 at 2:51 PM\, Craig A. Berry \craig\.a\.berry@​gmail\.com wrote:
On Sun\, Feb 17\, 2013 at 1:39 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
Before looking at the other things you found\, I think we should resolve this.
If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken.
I read and thought about this a bit more and I now think the characters showing up as printable really are printable in the DEC Multinational Character Set (DEC-MCS) which I think is what's going to be in the default locale on VMS. The standard for locale at:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02
just says that the C or POSIX locale governs "data consisting entirely of characters from the portable character set and the control character set. For other characters\, the behavior is unspecified."
So I guess everyone gets to specify what's left differently. Here are the characters above 127 that are considered printable on VMS:
$ perl -"Mlocale" -e "for (128..255) {print qq/$_\n/ if chr($_) =~ /[[:print:]]/};" 161 162 163 165 167 168 169 170 171 176 177 178 179 181 182 183 185 186 187 188 189 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 209 210 211 212 213 214 215 216 217 218 219 220 221 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 241 242 243 244 245 246 247 248 249 250 251 252 253
Eyeballing the chart at:
http://en.wikipedia.org/wiki/Multinational_Character_Set
or
http://www.columbia.edu/kermit/dec-mcs.html
it looks to me like those characters really are printable in DEC-MCS.
So t/re/charset.t is not really correct in flagging anything that has printables above 127 as a "bad locale". It might be a locale that we don't know how to test or for which we would have to maintain a separate set of test data\, but I don't see that it's out of line with the standard.
On 02/17/2013 01:51 PM\, Craig A. Berry wrote:
On Sun\, Feb 17\, 2013 at 1:39 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
The patch should have done the require and import at compile time\, and I should have caught that before applying.
It's the way things work\, so I guess you can call it a feature. The regular expression is compiled at compile time\, so if it is to use locale\, that fact must be known at compile time. Normally\, one does 'use locale'\, and so there is no problem.
You can try the attached patch and let me know if that works. I used the 'if' module in it. I asked the patch's author\, Jess\, why she hadn't used 'if'\, and she said in essence that doing so would be adding another potential point of failure\, and that seemed reasonable to me. But I'm thinking that we should discuss that on this list. It makes the code simpler\, and this kind of error would not have happened (if the attached patch works)\, and perhaps we can assume that by the time we get to testing regular expressions\, that we know that 'if' works.
Thanks\, Karl. The patch you attached does the trick (though note it has CRLF line endings). I wonder how that happened.
Interestingly\, this also works:
--- t/re/charset.t;-0 2013-02-09 21:56:22 -0600 +++ t/re/charset.t 2013-02-17 14:36:23 -0600 @@ -45\,7 +45\,7 @@ if (! is_miniperl() && $Config{d_setloca # Some locale implementations don't have the 128-255 characters all # mean nothing. Skip the locale tests in that situation for my $i (128 .. 255) { - goto bad_locale if chr($i) =~ /[[:print:]]/; + goto bad_locale if chr($i) =~ /[[:print:]]/l; } push @charsets\, 'l'; bad_locale: [end]
I guess adding /l to the regex causes something to get initialized (or re-initialized) at run-time. But it's still probably safer to make sure locale.pm does its thing at compile time.
The /l will cause it to compile for locale regardless of whether 'use locale' is in effect or not\, but perlre cautions against using it like this\, for several reasons. The only reason we document it is because Perl can't keep a secret; it should be internal only.
Before looking at the other things you found\, I think we should resolve this.
If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken. Here are the details (nothing below 161 matches):
Actually\, I meant the other problems you said were in the patch\, and to which you responded to Jess already.
On 02/17/2013 02:37 PM\, Craig A. Berry wrote:
On Sun\, Feb 17\, 2013 at 2:51 PM\, Craig A. Berry \craig\.a\.berry@​gmail\.com wrote:
On Sun\, Feb 17\, 2013 at 1:39 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
Before looking at the other things you found\, I think we should resolve this.
If by other things you mean why some characters in the range 161-255 are considered printable\, that's a good question. I assume the locale database is rather broken.
I read and thought about this a bit more and I now think the characters showing up as printable really are printable in the DEC Multinational Character Set (DEC-MCS) which I think is what's going to be in the default locale on VMS. The standard for locale at:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02
just says that the C or POSIX locale governs "data consisting entirely of characters from the portable character set and the control character set. For other characters\, the behavior is unspecified."
So I guess everyone gets to specify what's left differently.
If I ever knew that\, I had forgotten\, and had been under the misapprehension that the C locale meant that only the ascii characters should be defined. Thanks for setting me straight.
e those characters really are printable in DEC-MCS.
So t/re/charset.t is not really correct in flagging anything that has printables above 127 as a "bad locale". It might be a locale that we don't know how to test or for which we would have to maintain a separate set of test data\, but I don't see that it's out of line with the standard.
You're right. It does mean that it's untestable\, though. I'll change the comments and label to indicate that.
On Sun\, Feb 17\, 2013 at 4:59 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
On 02/17/2013 02:37 PM\, Craig A. Berry wrote:
So t/re/charset.t is not really correct in flagging anything that has printables above 127 as a "bad locale". It might be a locale that we don't know how to test or for which we would have to maintain a separate set of test data\, but I don't see that it's out of line with the standard.
You're right. It does mean that it's untestable\, though. I'll change the comments and label to indicate that.
Thanks. While you're there\, it looks like fold_grind.t has the exact same code to check for a bad/untestable locale.
Migrated from rt.perl.org#116693 (status was 'resolved')
Searchable as RT116693$