Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.86k stars 528 forks source link

Version Tuple parsing errors #1961

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#3234 (status was 'resolved')

Searchable as RT3234$

p5pRT commented 24 years ago

From ian@dial.pipex.com

Created by ian@homer.dial.pipex.com

Two bugs.

1​: The parsing of 'version tuples' is dependent on the numbers in the tuples. The perldata manual suggests that the result of such a literal is a string of Unicode characters\, but this isn't always the case​:

% perl -We 'sub c{ print join " "\, unpack "C*"\,$_[0]; print "\n"; } c 256.255.254; c 255.254.253;' 196 128 195 191 195 190 255 254 253

It depends whether the 'version' contains a number > 255\, in which case all numbers are interpreted as utf8\, otherwise as unsigned bytes. This still applies if 'use utf8' is in force.

2​: The token v1234 is treated either as a 'version' constant or as a bareword string depending on context. The example in perldata works correctly\, but this doesn't​:

% perl -We 'sub vers { v1234 }; print vers()\,"\n";' v1234

Sorry\, no patch. I looked for\, and failed to find\, where this is parsed.

Ian

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.6.0: Configured by ian at Mon May 1 21:14:37 BST 2000. Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration: Platform: osname=linux, osvers=2.0.34, archname=i686-linux uname='linux homer 2.0.34 #4 fri apr 30 17:59:32 bst 1999 i686 unknown ' config_args='-der' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef Compiler: cc='cc', optimize='-O2', gccversion=egcs-2.90.29 980515 (egcs-1.0.3 release) cppflags='-I/usr/local/include' ccflags ='-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lndbm -lgdbm -ldbm -ldb -ldl -lm -lc libc=, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.6.0: /u2/ian/lib/perl5 /usr/local/lib/perl5/5.6.0/i686-linux /usr/local/lib/perl5/5.6.0 /usr/local/lib/perl5/site_perl/5.6.0/i686-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl . Environment for perl v5.6.0: HOME=/u2/ian LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/u2/ian/cmd:/u2/ian/Linux:/usr/local/bin:/u2/ian/stocks/cmd:/usr/openwin/bin:/usr/bin/X11:/usr/bin:/bin:/usr/lib/teTeX/bin:/usr/etc:/sbin:/usr/sbin PERL5LIB=/u2/ian/lib/perl5 PERL_BADLANG (unset) SHELL=/bin/zsh ```
p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Ian Phillipps writes​:

1​: The parsing of 'version tuples' is dependent on the numbers in the tuples. The perldata manual suggests that the result of such a literal is a string of Unicode characters\, but this isn't always the case​:

% perl -We 'sub c{ print join " "\, unpack "C*"\,$_[0]; print "\n"; } c 256.255.254; c 255.254.253;' 196 128 195 191 195 190 255 254 253

It depends whether the 'version' contains a number > 255\, in which case all numbers are interpreted as utf8\, otherwise as unsigned bytes. This still applies if 'use utf8' is in force.

Your are confused. It is a bug in unpack\, not in tuples.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Ilya Zakharevich \ilya@​math\.ohio\-state\.edu wrote

Your are confused. It is a bug in unpack\, not in tuples.

It may or may not be a bug in unpack\, but there certainly *is* trouble with tuples. What do you make of this example?

% perl5.6.0 -wde 1 Default die handler restored.

Loading DB routines from perl5db.pl version 1.07 Editor support available.

Enter h or `h h' for help\, or `man perldebug' for more help.

main​::(-e​:1)​: 1   DB\<1> $x = 256.255.254

  DB\<2> x $x eq "\x{100}\x{ff}\x{fe}" 0 ''   DB\<3>

Mike Guy

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Fri\, May 12\, 2000 at 12​:22​:17AM +0100\, Ian Phillipps wrote​:

Your are confused. It is a bug in unpack\, not in tuples.

Not so. There is no unpack here​:

% perl -e '$x=254.255.256; print $x' | od -c
0000000 303 276 303 277 304 200 0000006 % perl -e '$x=253.254.255; print $x' | od -c 0000000 375 376 377 0000003

This is a bug in print().

Or​: ~ % perl -e '$x=253.254.255.256; { use bytes; print length($x)\,"\n" }' 8 ~ % perl -e '$x=253.254.255; { use bytes; print length($x)\,"\n" }' 3

`use bytes' is not supported. You use it on your own risk.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

M.J.T. Guy writes​:

Your are confused. It is a bug in unpack\, not in tuples.

It may or may not be a bug in unpack\, but there certainly *is* trouble with tuples.

Nope. There is trouble with interpretation of utf8-data\, but AFAIK tuples are handled correctly. It is when you *use* them you got into trouble.

But lemme check...

monk​:\~/perl/perl-5.6.0->./perl -Ilib -MDevel​::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208   REFCNT = 1   FLAGS = (POK\,READONLY\,pPOK\,UTF8)   PV = 0x12dae0 "\304\200\303\277\303\276"\0   CUR = 6   LEN = 8 monk​:\~/perl/perl-5.6.0->./perl -Ilib -MDevel​::Peek -wle 'Dump 253.255.254' SV = PV(0x127468) at 0x127208   REFCNT = 1   FLAGS = (POK\,READONLY\,pPOK)   PV = 0x12dae0 "\375\377\376"   CUR = 3   LEN = 8

Yes\, no problem at all.

Ilya

p5pRT commented 24 years ago

From @gsar

On Fri\, 12 May 2000 13​:27​:49 EDT\, Ilya Zakharevich wrote​:

On Fri\, May 12\, 2000 at 12​:22​:17AM +0100\, Ian Phillipps wrote​:

Your are confused. It is a bug in unpack\, not in tuples.

Not so. There is no unpack here​:

% perl -e '$x=254.255.256; print $x' | od -c
0000000 303 276 303 277 304 200 0000006 % perl -e '$x=253.254.255; print $x' | od -c 0000000 375 376 377 0000003

This is a bug in print().

I think Ilya is saying that you shouldn't have to care how the bits are represented internally (a character is a character\, never mind the internal optimization that it may be encoded as either utf8 or as bytes).

If that's what he's saying\, I agree with him.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Ilya Zakharevich \ilya@&#8203;math\.ohio\-state\.edu wrote

Nope. There is trouble with interpretation of utf8-data\, but AFAIK tuples are handled correctly. It is when you *use* them you got into trouble.

So you're saying there's a bug in 'eq'\, and presumably in almost every other string operator?

But lemme check...

monk​:\~/perl/perl-5.6.0->./perl -Ilib -MDevel​::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8

Lemme check again...

% perl5.6.0 -MDevel​::Peek -wle 'Dump "\x{100}\x{ff}\x{fe}"' SV = PV(0xeafbc) at 0xea9f0   REFCNT = 1   FLAGS = (POK\,READONLY\,pPOK\,UTF8)   PV = 0xf1d98 "\304\200\377\376"\0   CUR = 4   LEN = 5

So are 256.255.254 and "\x{100}\x{ff}\x{fe}" different strings?

I guess I don't understand this UTF8 stuff.

Mike Guy

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

M.J.T. Guy writes​:

monk​:\~/perl/perl-5.6.0->./perl -Ilib -MDevel​::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8

% perl5.6.0 -MDevel​::Peek -wle 'Dump "\x{100}\x{ff}\x{fe}"' SV = PV(0xeafbc) at 0xea9f0 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0xf1d98 "\304\200\377\376"\0

This is a bug.

CUR = 4 LEN = 5

I guess I don't understand this UTF8 stuff.

There is nothing to understand. 5.6.0 is a pre-alpha as far as threads and utf8 are concerned. It is a pre-beta in all the other respects.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Just to clarify\, 5.6.0 isn't pre-beta. It is released code.

-- ___cliff rayman___www.genwax.com___cliff@​genwax.com___

Ilya Zakharevich wrote​:

M.J.T. Guy writes​:

monk​:\~/perl/perl-5.6.0->./perl -Ilib -MDevel​::Peek -wle 'Dump 256.255.254' SV = PV(0x127468) at 0x127208 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0x12dae0 "\304\200\303\277\303\276"\0 CUR = 6 LEN = 8

% perl5.6.0 -MDevel​::Peek -wle 'Dump "\x{100}\x{ff}\x{fe}"' SV = PV(0xeafbc) at 0xea9f0 REFCNT = 1 FLAGS = (POK\,READONLY\,pPOK\,UTF8) PV = 0xf1d98 "\304\200\377\376"\0

This is a bug.

CUR = 4 LEN = 5

I guess I don't understand this UTF8 stuff.

There is nothing to understand. 5.6.0 is a pre-alpha as far as threads and utf8 are concerned. It is a pre-beta in all the other respects.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Gurusamy Sarathy \gsar@&#8203;ActiveState\.com wrote

I think Ilya is saying that you shouldn't have to care how the bits are represented internally (a character is a character\, never mind the internal optimization that it may be encoded as either utf8 or as bytes).

That's how I had always understood it too. Except I'd understood it as "doesn't have to care" or even "can't tell" (except if "use bytes" or "use utf8" are in effect). And except for bugs.

But given that interpretation\, I'm amazed at how many operators seem to be broken with UTF8. It certainly supports Ilya's contention of "pre-alpha".

Here's another example​:

  DB\<1> x (256.255.254 . 257.258.259) eq (256.255.254.257.258.259) 0 ''   DB\<2>

Rummaging with Devel​::Peek shows that in this case\, it's the fault of the . operator.

And eq is broken as well​:

  DB\<11> x "\x{100}" eq "\xc4\x80" 0 1   DB\<12>

Aaaaargh!

Mike Guy

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

___cliff rayman___ writes​:

Just to clarify\, 5.6.0 isn't pre-beta. It is released code.

  If a cage with an elephant is labeled "A Tiger"\, do not trust your eyes.

  -- Kos'ma Prutkov [*]

[*] Relationships to 3 Tolstoy's and la Rochfoucault (sp) are left as   an exercise to the reader.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

M.J.T. Guy writes​:

But given that interpretation\, I'm amazed at how many operators seem to be broken with UTF8. It certainly supports Ilya's contention of "pre-alpha".

I do not think it was ever announced otherwise. If you ignore perl.com\, as I think many people do. (This is in contrast to my accessment of general pre-bata-ness of 5.6.0\, which a lot of people do not share.)

Rummaging with Devel​::Peek shows that in this case\, it's the fault of the . operator.

And eq is broken as well​:

Yes\, these were in the list of operators to fix. But I consider *this* decision of Sarathy (release utf8 as is) as quite justified.

Ilya

p5pRT commented 24 years ago

From @gsar

On Sat\, 13 May 2000 09​:20​:50 BST\, "M.J.T. Guy" wrote​:

But given that interpretation\, I'm amazed at how many operators seem to be broken with UTF8. It certainly supports Ilya's contention of "pre-alpha".

Call it whatever you like--I call such brokenness "experimental". ;-)

Here's another example​:

DB\<1> x (256.255.254 . 257.258.259) eq (256.255.254.257.258.259) 0 '' DB\<2>

Rummaging with Devel​::Peek shows that in this case\, it's the fault of the . operator.

And eq is broken as well​:

DB\<11> x "\x{100}" eq "\xc4\x80" 0 1 DB\<12>

Aaaaargh!

FWIW\, both cases above are due to a broken eq.

  % bleadperl -de 0   DB\<1> x (256.255.254 . 257.258.259) eq (256.255.254.257.258.259)   0 1   DB\<2> x "\x{100}" eq "\xc4\x80"   0 ''

Sarathy gsar@​ActiveState.com

p5pRT commented 23 years ago

From The RT System itself

All the problems listed in this thread (except the one with print\, and that is a known deep bug) seem to have been fixed in the latest development releases (post-5.7.0) of Perl.