Closed p5pRT closed 20 years ago
Two bugs.
1: The parsing of 'version tuples' is dependent on the numbers in the tuples. The perldata manual suggests that the result of such a literal is a string of Unicode characters\, but this isn't always the case:
% perl -We 'sub c{ print join " "\, unpack "C*"\,$_[0]; print "\n"; } c 256.255.254; c 255.254.253;' 196 128 195 191 195 190 255 254 253
It depends whether the 'version' contains a number > 255\, in which case all numbers are interpreted as utf8\, otherwise as unsigned bytes. This still applies if 'use utf8' is in force.
2: The token v1234 is treated either as a 'version' constant or as a bareword string depending on context. The example in perldata works correctly\, but this doesn't:
% perl -We 'sub vers { v1234 }; print vers()\,"\n";' v1234
Sorry\, no patch. I looked for\, and failed to find\, where this is parsed.
Ian
Ian Phillipps writes:
1: The parsing of 'version tuples' is dependent on the numbers in the tuples. The perldata manual suggests that the result of such a literal is a string of Unicode characters\, but this isn't always the case:
% perl -We 'sub c{ print join " "\, unpack "C*"\,$_[0]; print "\n"; } c 256.255.254; c 255.254.253;' 196 128 195 191 195 190 255 254 253
It depends whether the 'version' contains a number > 255\, in which case all numbers are interpreted as utf8\, otherwise as unsigned bytes. This still applies if 'use utf8' is in force.
Your are confused. It is a bug in unpack\, not in tuples.
Ilya
Ilya Zakharevich \ilya@​math\.ohio\-state\.edu wrote
Your are confused. It is a bug in unpack\, not in tuples.
It may or may not be a bug in unpack\, but there certainly *is* trouble with tuples. What do you make of this example?
% perl5.6.0 -wde 1 Default die handler restored.
Loading DB routines from perl5db.pl version 1.07 Editor support available.
Enter h or `h h' for help\, or `man perldebug' for more help.
main::(-e:1): 1 DB\<1> $x = 256.255.254
DB\<2> x $x eq "\x{100}\x{ff}\x{fe}" 0 '' DB\<3>
Mike Guy
On Fri\, May 12\, 2000 at 12:22:17AM +0100\, Ian Phillipps wrote:
Your are confused. It is a bug in unpack\, not in tuples.
Not so. There is no unpack here:
% perl -e '$x=254.255.256; print $x' | od -c
0000000 303 276 303 277 304 200 0000006 % perl -e '$x=253.254.255; print $x' | od -c 0000000 375 376 377 0000003
This is a bug in print().
Or: ~ % perl -e '$x=253.254.255.256; { use bytes; print length($x)\,"\n" }' 8 ~ % perl -e '$x=253.254.255; { use bytes; print length($x)\,"\n" }' 3
`use bytes' is not supported. You use it on your own risk.
Ilya
M.J.T. Guy writes:
Your are confused. It is a bug in unpack\, not in tuples.
It may or may not be a bug in unpack\, but there certainly *is* trouble with tuples.
Nope. There is trouble with interpretation of utf8-data\, but AFAIK tuples are handled correctly. It is when you *use* them you got into trouble.
But lemme check...
monk:\~/perl/perl-5.6.0->./perl -Ilib -MDevel::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8 monk:\~/perl/perl-5.6.0->./perl -Ilib -MDevel::Peek -wle 'Dump 253.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK) PV = 0x12dae0 "\375\377\376" CUR = 3 LEN = 8
Yes\, no problem at all.
Ilya
On Fri\, 12 May 2000 13:27:49 EDT\, Ilya Zakharevich wrote:
On Fri\, May 12\, 2000 at 12:22:17AM +0100\, Ian Phillipps wrote:
Your are confused. It is a bug in unpack\, not in tuples.
Not so. There is no unpack here:
% perl -e '$x=254.255.256; print $x' | od -c
0000000 303 276 303 277 304 200 0000006 % perl -e '$x=253.254.255; print $x' | od -c 0000000 375 376 377 0000003This is a bug in print().
I think Ilya is saying that you shouldn't have to care how the bits are represented internally (a character is a character\, never mind the internal optimization that it may be encoded as either utf8 or as bytes).
If that's what he's saying\, I agree with him.
Sarathy gsar@ActiveState.com
Ilya Zakharevich \ilya@​math\.ohio\-state\.edu wrote
Nope. There is trouble with interpretation of utf8-data\, but AFAIK tuples are handled correctly. It is when you *use* them you got into trouble.
So you're saying there's a bug in 'eq'\, and presumably in almost every other string operator?
But lemme check...
monk:\~/perl/perl-5.6.0->./perl -Ilib -MDevel::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8
Lemme check again...
% perl5.6.0 -MDevel::Peek -wle 'Dump "\x{100}\x{ff}\x{fe}"' SV = PV(0xeafbc) at 0xea9f0 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0xf1d98 "\304\200\377\376"\0 CUR = 4 LEN = 5
So are 256.255.254 and "\x{100}\x{ff}\x{fe}" different strings?
I guess I don't understand this UTF8 stuff.
Mike Guy
M.J.T. Guy writes:
monk:\~/perl/perl-5.6.0->./perl -Ilib -MDevel::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8
% perl5.6.0 -MDevel::Peek -wle 'Dump "\x{100}\x{ff}\x{fe}"' SV = PV(0xeafbc) at 0xea9f0 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0xf1d98 "\304\200\377\376"\0
This is a bug.
CUR = 4 LEN = 5
I guess I don't understand this UTF8 stuff.
There is nothing to understand. 5.6.0 is a pre-alpha as far as threads and utf8 are concerned. It is a pre-beta in all the other respects.
Ilya
Just to clarify\, 5.6.0 isn't pre-beta. It is released code.
-- ___cliff rayman___www.genwax.com___cliff@genwax.com___
Ilya Zakharevich wrote:
M.J.T. Guy writes:
monk:\~/perl/perl-5.6.0->./perl -Ilib -MDevel::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8
% perl5.6.0 -MDevel::Peek -wle 'Dump "\x{100}\x{ff}\x{fe}"' SV = PV(0xeafbc) at 0xea9f0 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0xf1d98 "\304\200\377\376"\0
This is a bug.
CUR = 4 LEN = 5
I guess I don't understand this UTF8 stuff.
There is nothing to understand. 5.6.0 is a pre-alpha as far as threads and utf8 are concerned. It is a pre-beta in all the other respects.
Ilya
Gurusamy Sarathy \gsar@​ActiveState\.com wrote
I think Ilya is saying that you shouldn't have to care how the bits are represented internally (a character is a character\, never mind the internal optimization that it may be encoded as either utf8 or as bytes).
That's how I had always understood it too. Except I'd understood it as "doesn't have to care" or even "can't tell" (except if "use bytes" or "use utf8" are in effect). And except for bugs.
But given that interpretation\, I'm amazed at how many operators seem to be broken with UTF8. It certainly supports Ilya's contention of "pre-alpha".
Here's another example:
DB\<1> x (256.255.254 . 257.258.259) eq (256.255.254.257.258.259) 0 '' DB\<2>
Rummaging with Devel::Peek shows that in this case\, it's the fault of the . operator.
And eq is broken as well:
DB\<11> x "\x{100}" eq "\xc4\x80" 0 1 DB\<12>
Aaaaargh!
Mike Guy
___cliff rayman___ writes:
Just to clarify\, 5.6.0 isn't pre-beta. It is released code.
If a cage with an elephant is labeled "A Tiger"\, do not trust your eyes.
-- Kos'ma Prutkov [*]
[*] Relationships to 3 Tolstoy's and la Rochfoucault (sp) are left as an exercise to the reader.
M.J.T. Guy writes:
But given that interpretation\, I'm amazed at how many operators seem to be broken with UTF8. It certainly supports Ilya's contention of "pre-alpha".
I do not think it was ever announced otherwise. If you ignore perl.com\, as I think many people do. (This is in contrast to my accessment of general pre-bata-ness of 5.6.0\, which a lot of people do not share.)
Rummaging with Devel::Peek shows that in this case\, it's the fault of the . operator.
And eq is broken as well:
Yes\, these were in the list of operators to fix. But I consider *this* decision of Sarathy (release utf8 as is) as quite justified.
Ilya
On Sat\, 13 May 2000 09:20:50 BST\, "M.J.T. Guy" wrote:
But given that interpretation\, I'm amazed at how many operators seem to be broken with UTF8. It certainly supports Ilya's contention of "pre-alpha".
Call it whatever you like--I call such brokenness "experimental". ;-)
Here's another example:
DB\<1> x (256.255.254 . 257.258.259) eq (256.255.254.257.258.259) 0 '' DB\<2>
Rummaging with Devel::Peek shows that in this case\, it's the fault of the . operator.
And eq is broken as well:
DB\<11> x "\x{100}" eq "\xc4\x80" 0 1 DB\<12>
Aaaaargh!
FWIW\, both cases above are due to a broken eq.
% bleadperl -de 0 DB\<1> x (256.255.254 . 257.258.259) eq (256.255.254.257.258.259) 0 1 DB\<2> x "\x{100}" eq "\xc4\x80" 0 ''
Sarathy gsar@ActiveState.com
All the problems listed in this thread (except the one with print\, and that is a known deep bug) seem to have been fixed in the latest development releases (post-5.7.0) of Perl.
Migrated from rt.perl.org#3234 (status was 'resolved')
Searchable as RT3234$