Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.84k stars 524 forks source link

documentation - utf8 vector numbers not described in perldata.pod #1249

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#2242 (status was 'resolved')

Searchable as RT2242$

p5pRT commented 24 years ago

From dcd@tc.fluke.com

Created by dcd@tc.fluke.com

The utf8 vector style numbers   1.23.456   1.2_3.4_5_6 are not documented in L\<perldata/"Scalar value constructors">

Maybe they shouldn't be described as numbers (since to compare them on needs "cmp"\, right\, and I see they are documented in L\<perlop/"Strings of Character">

I think there should be some mention in L\<perldata/"Scalar value constructors"> since not only does it mention Numeric literals\, it also talks about String literals.

It seems strange that these numeric ordinal strings are mentioned in perlop when discussing the Unicode strings and string comparison operators   (I was searching for cmp at the time) and not when describing the literals in perldata.

Perl Info ``` Site configuration information for perl v5.5.670: Configured by dcd at Wed Mar 1 13:09:07 PST 2000. Summary of my perl5 (revision 5.0 version 5 subversion 670) configuration: Platform: osname=linux, osvers=2.2.15pre10, archname=i686-linux uname='linux dd 2.2.15pre10 #2 thu feb 24 09:36:58 pst 2000 i686 ' config_args='-Doptimize=-g -de -Dcf_email=dcd@tc.fluke.com' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef Compiler: cc='cc', optimize='-g', gccversion=2.7.2.3 cppflags='-DDEBUGGING -I/usr/local/include' ccflags ='-DDEBUGGING -I/usr/local/include' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -ldbm -ldb -ldl -lm -lc libc=/lib/libc.so.5.4.44, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.5.670: /usr/local/lib/perl5/5.5.670/i686-linux /usr/local/lib/perl5/5.5.670 /usr/local/lib/perl5/site_perl/5.5.670/i686-linux /usr/local/lib/perl5/site_perl/5.5.670 /usr/local/lib/perl5/site_perl . Environment for perl v5.5.670: HOME=/home/dcd LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/dcd/bin:/sbin:/usr/local/bin:/bin:/usr/bin:/usr/X11/bin:/usr/games:/usr/local/samba:/home/hobbes/tools/scripts:/home/hobbes/tools/linux:/usr0/hobbes/tools/scripts:/usr0/dcd/bin:/apps/general/bin:/usr/public PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 24 years ago

From @gsar

On Wed\, 01 Mar 2000 16​:58​:56 PST\, David Dyck wrote​:

It seems strange that these numeric ordinal strings are mentioned in perlop when discussing the Unicode strings and string comparison operators (I was searching for cmp at the time) and not when describing the literals in perldata.

Yup\, perldata would certainly be the better place for it. I'll take a patch if you'll be kind enough to supply one. :-)

Thanks.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Wed\, 1 Mar 2000\, Gurusamy Sarathy wrote​:

On Wed\, 01 Mar 2000 16​:58​:56 PST\, David Dyck wrote​:

It seems strange that these numeric ordinal strings are mentioned in perlop when discussing the Unicode strings and string comparison operators (I was searching for cmp at the time) and not when describing the literals in perldata.

Yup\, perldata would certainly be the better place for it. I'll take a patch if you'll be kind enough to supply one. :-)

Ok\, here's a patch

Inline Patch ```diff --- perl-current/pod/perldata.pod.orig Wed Mar 1 17:08:38 2000 +++ perl-current/pod/perldata.pod Wed Mar 1 17:20:27 2000 @@ -274,6 +274,8 @@ 0xff # hex 0377 # octal 0b011011 # binary + 5.005_03 # floating point version number + v5.5.30 # utf8 version string (also written as 5.5.30) String literals are usually delimited by either single or double quotes. They work much like quotes in the standard Unix shells: @@ -287,6 +289,20 @@ (e.g. '0xff') are not automatically converted to their integer representation. The hex() and oct() functions make these conversions for you. See L and L for more details. + +A literal of the form C is parsed as a string composed +of characters with the specified ordinals. This provides an alternative, +more readable way to construct strings, rather than use the somewhat less +readable interpolation form C<"\x{1}\x{14}\x{12c}\x{fa0}">. This is useful +for representing Unicode strings, and for comparing version "numbers" +using the string comparison operators, C, C, C etc. + +If there are two or more dots in the literal, the leading C may be +omitted. + +Such literals are accepted by both C and C for doing a version +check. The C<$^V> special variable also contains the running Perl +interpreter's version in this form. See L. You can also embed newlines directly in your strings, i.e., they can end on a different line than they begin. This is nice, but if you forget ```
p5pRT commented 24 years ago

From @gsar

On Wed\, 01 Mar 2000 17​:23​:45 PST\, David Dyck wrote​:

Ok\, here's a patch

Thanks.

--- perl-current/pod/perldata.pod.orig Wed Mar 1 17​:08​:38 2000 +++ perl-current/pod/perldata.pod Wed Mar 1 17​:20​:27 2000 @​@​ -274\,6 +274\,8 @​@​ 0xff # hex 0377 # octal 0b011011 # binary + 5.005_03 # floating point version number + v5.5.30 # utf8 version string (also written as 5.5.30)

s/(utf8|version) //g\, I think.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Wed\, 1 Mar 2000\, Gurusamy Sarathy wrote​:

+ v5.5.30 # utf8 version string (also written as 5.5.30)

s/(utf8|version) //g\, I think.

your are the pumpkin king\, so as you wish :-)

"strings represented as a vector of ordinals"   seems a bit long

Have you or p5p reached an agreement on what these strings are called?

  utf8 version strings   utf8 vector number strings

also in perldelta (I'll quote it here)

  To cope with the new versioning system's use of at least three significant   digits for each version component\, the method used for incrementing the   subversion number has also changed slightly. We assume that versions older   than v5.6 have been incrementing the subversion component in multiples of   10. Versions after v5.6.0 will increment them by 1. Thus\, using the new   notation\, 5.005_03 is the same as v5.5.30\, and the first maintenance   version following v5.6.0 will be v5.6.1\, which amounts to a floating point   value of 5.006_001).

there is talk of an equivalence between   float string   5.005_03 v5.5.30 ( == 5.5.30)   5.006_001 v5.6.1

I guess this is a 'mental' equivalence\, as

$ perl -wle 'print 5.005_03 cmp v5.5.30' 1 $ perl -wle 'print 5.005_03 \<=> v5.5.30' Argument "^E^E^^" isn't numeric in numeric comparison (\<=>) at -e line 1. 1

I didn't see the conversions in (str_to_version in toke.c and XS(XS_UNIVERSAL_VERSION) in universal.c) documented in the pod files.

Can perl do what I mean when I say   5.005_03 cmp v5.5.30 ???  

p5pRT commented 24 years ago

From @gsar

On Wed\, 01 Mar 2000 18​:08​:50 PST\, David Dyck wrote​:

Have you or p5p reached an agreement on what these strings are called?

utf8 version strings utf8 vector number strings

I'd just call them "strings"\, because that's what they are. (And bringing in utf8 is probably a distraction because the intent is for them not to char^Wcare even if it happens to be utf8.)

there is talk of an equivalence between float string 5.005_03 v5.5.30 ( == 5.5.30) 5.006_001 v5.6.1

I guess this is a 'mental' equivalence\, as

$ perl -wle 'print 5.005_03 cmp v5.5.30' 1 $ perl -wle 'print 5.005_03 \<=> v5.5.30' Argument "^E^E^^" isn't numeric in numeric comparison (\<=>) at -e line 1. 1

I guess warnings such as that could print "unreadable" strings in the dotted format now.

I didn't see the conversions in (str_to_version in toke.c and XS(XS_UNIVERSAL_VERSION) in universal.c) documented in the pod files.

Yup.

Can perl do what I mean when I say 5.005_03 cmp v5.5.30 ???

Not likely\, I think.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From @tamias

On Wed\, Mar 01\, 2000 at 05​:23​:45PM -0800\, David Dyck wrote​:

+A literal of the form C\<v1.20.300.4000> is parsed as a string composed +of characters with the specified ordinals. This provides an alternative\, +more readable way to construct strings\, rather than use the somewhat less +readable interpolation form C\<"\x{1}\x{14}\x{12c}\x{fa0}">. This is useful +for representing Unicode strings\, and for comparing version "numbers" +using the string comparison operators\, C\\, C\\, C\ etc. + +If there are two or more dots in the literal\, the leading C\ may be +omitted.

How much is gained by making the v optional\, only if there are three or more components to the string? I feel that the advantage of saving a single character is outweighed by the loss of a distinctive syntax\, especially when considering maintainability.

Ronald

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Wed\, 1 Mar 2000\, Ronald J Kimball wrote​:

On Wed\, Mar 01\, 2000 at 05​:23​:45PM -0800\, David Dyck wrote​:

+A literal of the form C\<v1.20.300.4000> is parsed as a string ... +If there are two or more dots in the literal\, the leading C\ may be +omitted.

How much is gained by making the v optional\, only if there are three or more components to the string? I feel that the advantage of saving a single character is outweighed by the loss of a distinctive syntax\, especially when considering maintainability.

Ronald

I'd reverse the question?   what is lost :-)   actually 80.101.114.108 was allowed before (e.g. 5.004_03)   but it was more confusing perl -le "print 80.101.114.108" 80.101114.108

  now it prints Perl

As was mentioned earlier this makes creation version numbers and strings from ordinals 'easier' though maybe the v in front of a version number is a 'flag'\, but when you start finding other uses for this new feature (and I'm sure there will be many more) the v would get in the way.

maybe I just don't understand what you mean by the " the loss of a distinctive syntax\,   especially when considering maintainability."

p5pRT commented 24 years ago

From @tamias

On Wed\, Mar 01\, 2000 at 11​:44​:44PM -0800\, David Dyck wrote​:

On Wed\, 1 Mar 2000\, Ronald J Kimball wrote​:

On Wed\, Mar 01\, 2000 at 05​:23​:45PM -0800\, David Dyck wrote​:

+A literal of the form C\<v1.20.300.4000> is parsed as a string ... +If there are two or more dots in the literal\, the leading C\ may be +omitted.

How much is gained by making the v optional\, only if there are three or more components to the string? I feel that the advantage of saving a single character is outweighed by the loss of a distinctive syntax\, especially when considering maintainability.

Ronald

I'd reverse the question? what is lost :-) actually 80.101.114.108 was allowed before (e.g. 5.004_03) but it was more confusing perl -le "print 80.101.114.108" 80.101114.108

now it prints Perl

As was mentioned earlier this makes creation version numbers and strings from ordinals 'easier' though maybe the v in front of a version number is a 'flag'\, but when you start finding other uses for this new feature (and I'm sure there will be many more) the v would get in the way.

maybe I just don't understand what you mean by the " the loss of a distinctive syntax\, especially when considering maintainability."

Allow me to explain...

Up to now\, literals in Perl all have a distinctive syntax that makes it clear what the literal is.

For example​:

0x1234 Hexadecimal integer; starts with 0x 0b1011 Binary integer; starts with 0b 012345 Octal integer; starts with 0 123456 Decimal integer; doesn't start with 0

"abcd" String (interpolating); surrounded by double quotes 'abcd' String (not interpolating); surrounded by single quotes qq{ab} String (interpolating); starts with qq q{abc} String (not interpolating); starts with q

12.345 Floating point number; contains a decimal point

And now​:

v1.2.3 Whatever we're calling it today; starts with v

All of these literals have a distinctive syntax. If it starts with 0x\, it's hexadecimal. If it's got double quotes\, it's a string. Only a slight ambiguity arises with octal and floating point numbers (what is C\<05.5> ?) and even that is enough to confuse people.

But now this literal has been proposed​:

1.2.34 Whatever we're calling it today; contains more than one period

The only thing that distinguishes this literal\, and makes it a string\, is that it contains multiple periods. Otherwise\, it looks a lot like a number. Drop all but one of those periods\, and suddenly it's no longer a string; it really is a number. That is not a distinctive syntax.

Consider this hypothetical​: A programmer releases version 5.7.4 of his module​: $VERSION = 5.7.4; Then\, after more work\, he decides to bump the version to 5.8​: $VERSION = 5.8; Unless he thinks to add a .0\, his "version string" is no longer a version string; now it's a floating point number.

Also\, consider this​: The same programmer uses such v-less strings throughout his module. Unfortunately\, he forgets to add a 'require [version]' statement to his module. Someone running perl5.005_03 downloads the module\, and of course it doesn't work properly\, but there are no errors\, even with use strict\, because 1.2.3 is already perfectly legal syntax. Whereas\, with v1.2.3\, use strict will result in an error in previous versions of Perl.

My conclusion is that 1.2.3 is lacking a distinctive syntax that would make its meaning clear\, and that this will cause problems for readability and maintainability. If the v is not mnemonic enough\, because these things will be used for more than just version strings\, then choose an indicator that is more appropriate. But don't create new literals without a distinctive syntax.

Ronald

p5pRT commented 24 years ago

From @gsar

On Thu\, 02 Mar 2000 11​:59​:30 EST\, Ronald J Kimball wrote​:

But now this literal has been proposed​:

1.2.34 Whatever we're calling it today; contains more than one period

The only thing that distinguishes this literal\, and makes it a string\, is that it contains multiple periods. Otherwise\, it looks a lot like a number. Drop all but one of those periods\, and suddenly it's no longer a string; it really is a number. That is not a distinctive syntax.

Larry proposed the syntax because he clearly likes it. I implemented it because I clearly like it too.

I guess all I'm saying is that your opinion has been duly noted\, but I don't agree with it. :-)

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Thu\, Mar 02\, 2000 at 11​:59​:30AM -0500\, rjk@​linguist.dartmouth.edu wrote​:

Consider this hypothetical​: A programmer releases version 5.7.4 of his module​: $VERSION = 5.7.4; Then\, after more work\, he decides to bump the version to 5.8​: $VERSION = 5.8; Unless he thinks to add a .0\, his "version string" is no longer a version string; now it's a floating point number.

FWIW\, I agree. IIRC\, it was Larry who proposed the two decimal point heuristic. Maybe we can petition him to change him mind. (It shouldn't take more than three dissenters. :-)

-- "Never ascribe to malice that which can be explained by stupidity."   via\, but not speaking for Deutsche Bank

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

I just wanted to clarify what I think you are getting at​: by "distinctive syntax" you must mean syntax at the _start_ of a literal that tells you what it will be. For clearly having two periods is distinctive\, it's just not "head" distinctive\, as it were.

John.

p5pRT commented 24 years ago

From @tamias

On Thu\, Mar 02\, 2000 at 12​:49​:03PM -0500\, John L. Allen wrote​:

I just wanted to clarify what I think you are getting at​: by "distinctive syntax" you must mean syntax at the _start_ of a literal that tells you what it will be. For clearly having two periods is distinctive\, it's just not "head" distinctive\, as it were.

I meant distinctive in the sense "serving to distinguish". Personally\, I don't think that multiple periods distinguish one of these strings from a floating point number. Yes\, it is different\, but it is not distinctive.

Note that an integer and a floating point have distinctive syntax\, even though it's not at the start; one has no internal periods\, and one has an interal period. Also note that these vectors strings may contain a single period (or perhaps even no periods)\, so I think they should be distinct from other literals by more than just the existence or number of periods.

As an example\, I think that this would be distinctive​:

123#456#789

And so would this​:

qc{123 456 789}

But\, allowing for someone to write a vector string of one element\, head distinctive is necessary in that case. Personally\, I would have chosen to make it non-optional even for longer vector strings.

Ronald

P.S. When will I be able to write 0x7F simply as 7F? 1/2 :)

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Joshua N Pritikin wrote​:

On Thu\, Mar 02\, 2000 at 11​:59​:30AM -0500\, rjk@​linguist.dartmouth.edu wrote​:

Consider this hypothetical​: A programmer releases version 5.7.4 of his module​: $VERSION = 5.7.4; Then\, after more work\, he decides to bump the version to 5.8​: $VERSION = 5.8; Unless he thinks to add a .0\, his "version string" is no longer a version string; now it's a floating point number.

FWIW\, I agree. IIRC\, it was Larry who proposed the two decimal point heuristic. Maybe we can petition him to change him mind. (It shouldn't take more than three dissenters. :-)

Count me in. 5.8 and 5.7.4 should not be outrageously different beasts. It ups the write-onlyness of perl one more notch -- faced with "$ver = 4.0.1"\, exactly how would a novice go about searching the documentation for an explanation? Personally\, I would assume that those bizarre perl people were either using a strange floating-point notation (forcing on a guard bit?)\, or that it's a verbose way of saying "401". I wouldn't like either possibility\, and I'd never guess that the truth is worse. ;)

Forcing an explicit v prefix provides a clear signal that something weird is going on\, at least.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Steve Fink (lists.p5p)​:

FWIW\, I agree. IIRC\, it was Larry who proposed the two decimal point heuristic. Maybe we can petition him to change him mind. (It shouldn't take more than three dissenters. :-)

Count me in. 5.8 and 5.7.4 should not be outrageously different beasts.

For what it's worth\, I'll be the third. Looking at the hoops we've had to jump through to make the right thing compare with the right thing at the right time\, I'm starting to see the whole thing as an accumulation of kludges. Version strings as a primary data type made me a little uneasy to begin with\, but I can see the use for the triplet idea. However\, I think it really does need a sufficiently distinct syntax\, to borrow someone else's neat phrase. Just requiring a v would be nice.

-- "Language shapes the way we think\, and determines what we can think about." -- B. L. Whorf

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Ronald J Kimball \rjk@&#8203;linguist\.dartmouth\.edu writes​:

And now​:

v1.2.3 Whatever we're calling it today; starts with v syntax. Whereas\, with v1.2.3\, use strict will result in an error in previous versions of Perl.

Rubbish ;-)

#!perl

sub v1 () { 'Ha' }

print v1.2.3\,"\n"; __END__

Prints​:

Ha2.3

So we now need to warn under strict for subs with names matching /v\d+/ ...

-- Nick Ing-Simmons