Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.97k stars 555 forks source link

v-strings left of a => don't get quoted. #5817

Closed p5pRT closed 21 years ago

p5pRT commented 22 years ago

Migrated from rt.perl.org#16010 (status was 'resolved')

Searchable as RT16010$

p5pRT commented 22 years ago

From @abigail

Created by @abigail

See also http​://www.perlmonks.org/index.pl?node_id=187925.

The following prints "A\n" as expected​:

  perl -we 'print v65\, "\n"'

because 'v65' is seen as a v-string.

And

  perl -we 'print "v65"\, "\n"'

prints "v65\n" because the 'v65' is quoted.

However\,

  perl -we 'print v65 => "\n"'

also prints "A\n"\, and that's not what is expected\, because => is supposed to quote its left-hand side when the left-hand side is a bareword.

First I wondered whether this was a bug and not an intended "don't autoquote v-strings"\, but then I noticed that they do get auto-quoted when used as hash keys​:

  perl -wle '$h {v65} = 1; print keys %h'

prints "v65\n".

Abigail

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.8.0: Configured by abigail at Mon Jul 22 13:26:11 CEST 2002. Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration: Platform: osname=linux, osvers=2.4.5, archname=i686-linux-64int-ld uname='linux hermione 2.4.5 #6 fri jun 22 01:38:20 pdt 2001 i686 unknown ' config_args='-des -Uversiononly -Dmydomain=.foad.org -Dcf_email=abigail@foad ...org -Dperladmin=abigail@foad.org -Doptimize=-g -Dusemorebits -Dusedevel -Dusen= false -Dprefix=/opt/perl' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=unde f useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/ opt/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-g', cppflags='-DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/opt/local /include' ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t' , lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib -L/opt/local/lib' libpth=/usr/local/lib /opt/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.3.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.2.3' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib -L/opt/local/lib' Locally applied patches: @INC for perl v5.8.0: /home/abigail/Perl /home/abigail/Sybase /opt/perl/lib/5.8.0/i686-linux-64int-ld /opt/perl/lib/5.8.0 /opt/perl/lib/site_perl/5.8.0/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.0 /opt/perl/lib/site_perl/5.6.1 /opt/perl/lib/site_perl . Environment for perl v5.8.0: HOME=/home/abigail LANG (unset) LANGUAGE (unset) LC_ALL=POSIX LD_LIBRARY_PATH=/home/abigail/Lib:/usr/local/lib:/usr/lib:/lib:/usr/X11R6/li b:/opt/gnome/lib LOGDIR (unset) PATH=/home/abigail/Bin:/opt/perl/bin:/usr/local/bin:/usr/local/X11/bin:/usr/ bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/games:/opt/povray/b in:/opt/gnome/bin:/opt/opera/bin:/usr/share/texmf/bin:/opt/Acrobat4/bin:/opt/jav a/blackdown/j2sdk1.3.1/bin:/usr/local/games/bin:/opt/gnuplot/bin:/opt/mysql/bin PERL5LIB=/home/abigail/Perl:/home/abigail/Sybase PERLDIR=/opt/perl PERL_BADLANG (unset) SHELL=/usr/bin/bash ```
p5pRT commented 22 years ago

From @JohnPeacock

abigail@​foad.org (via RT) wrote​:

However\,

perl \-we 'print v65 => "\\n"'

also prints "A\n"\, and that's not what is expected\, because => is supposed to quote its left-hand side when the left-hand side is a bareword.

Except that the tokenizer has already made a v-string out of v65 by the time you get to the =>. Technically speaking\, it's not a bareword anymore.

First I wondered whether this was a bug and not an intended "don't autoquote v-strings"\, but then I noticed that they do get auto-quoted when used as hash keys​:

perl \-wle '$h \{v65\} = 1; print keys %h'

prints "v65\n".

The { is already pending\, so the tokenizer knows is should quote whatever is inside. It's not a bug and it's not an exception. The only part of Perl that knows whether something is a v-string is the tokenizer (and this has given me headache's for the last year or so).

John

-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4720 Boston Way Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

p5pRT commented 22 years ago

From @abigail

On Tue\, Aug 06\, 2002 at 11​:50​:57AM -0400\, John Peacock wrote​:

abigail@​foad.org (via RT) wrote​:

However\,

perl \-we 'print v65 => "\\n"'

also prints "A\n"\, and that's not what is expected\, because => is supposed to quote its left-hand side when the left-hand side is a bareword.

Except that the tokenizer has already made a v-string out of v65 by the time
you get to the =>. Technically speaking\, it's not a bareword anymore.

That what is a bareword\, and were is that documented?

First I wondered whether this was a bug and not an intended "don't autoquote v-strings"\, but then I noticed that they do get auto-quoted when used as hash keys​:

perl \-wle '$h \{v65\} = 1; print keys %h'

prints "v65\n".

The { is already pending\, so the tokenizer knows is should quote whatever is inside. It's not a bug and it's not an exception. The only part of Perl that knows whether something is a v-string is the tokenizer (and this has given me headache's for the last year or so).

That's an explaination of how it's currently implemented\, but that doesn't mean it's not a bug (otherwise\, nothing would ever be a bug\, would it?).

I don't find any of this back in the documentation.

Abigail

p5pRT commented 22 years ago

From @lizmat

To add to this misery\, something I ran into months\, if not years ago (and these posts triggered that memory)​:

  perl -e 'my %hash = (Module​:: => 1); print keys %hash'

lists the key "Module"\, instead of the expected "Module​::' (note the trailing '​::'). Seems like anything with a trailing '​::' is handled specially. Probably because it looks like a namespace qualification (which is my interpretation ;-).

This goes back to at least 5.00503...

Liz

p5pRT commented 22 years ago

From @nwc10

On Wed\, Aug 07\, 2002 at 01​:33​:22PM +0200\, Rafael Garcia-Suarez wrote​:

Abigail wrote​:

On Tue\, Aug 06\, 2002 at 11​:50​:57AM -0400\, John Peacock wrote​:

abigail@​foad.org (via RT) wrote​:

However\,

perl -we 'print v65 => "\n"'

also prints "A\n"\, and that's not what is expected\, because => is supposed to quote its left-hand side when the left-hand side is a bareword.

Except that the tokenizer has already made a v-string out of v65 by the time
you get to the =>. Technically speaking\, it's not a bareword anymore.

That what is a bareword\, and were is that documented?

Technically speaking\, print is not a bareword ("A word that has no other interpretation in the grammar" as perldata says)\, but you can write (print => "foo").

So by that definition\, a v string is not a bareword\, because it has another interpretation in the grammar

So for => to quote the text v65 on the left means that it has to change its documented behaviour slightly to "bare words and v-strings"\, surely?

[this doesn't stop it being a bug\, just that the bug is now that what is documented is not what is desired]

I think that the fact that => doesn't autoquote vstrings is a bug (and a hard to fix one.) (I even think it has already been reported.)

I'd agree - I'd like it to autoquote (would be) v strings.

Why does this work?

$ perl -wle 'sub v65 {print "Here"}; v65' Here

Surely that v65 is a constant ('A') in void context?

$ perl -lwe 'sub v65 {print "Here"}; v65\,' Here $ perl -lwe 'sub v65 {print "Here"}; v65\,""' Useless use of a constant in void context at -e line 1. Here $ perl -lwe 'sub v65 {print "Here"}; v65 => ""' Useless use of a constant in void context at -e line 1. Useless use of a constant in void context at -e line 1.

Ho ho ho. The inconsistency between those three has to be a bug\, surely? They can't all be consistent with the current documentation.

And order matters (unsurprisingly)

$ perl -lwe 'v65; sub v65 {print "Here"}' Useless use of a constant in void context at -e line 1.

Nicholas Clark

p5pRT commented 22 years ago

From @rgarcia

Abigail wrote​:

On Tue\, Aug 06\, 2002 at 11​:50​:57AM -0400\, John Peacock wrote​:

abigail@​foad.org (via RT) wrote​:

However\,

perl -we 'print v65 => "\n"'

also prints "A\n"\, and that's not what is expected\, because => is supposed to quote its left-hand side when the left-hand side is a bareword.

Except that the tokenizer has already made a v-string out of v65 by the time
you get to the =>. Technically speaking\, it's not a bareword anymore.

That what is a bareword\, and were is that documented?

Technically speaking\, print is not a bareword ("A word that has no other interpretation in the grammar" as perldata says)\, but you can write (print => "foo").

I think that the fact that => doesn't autoquote vstrings is a bug (and a hard to fix one.) (I even think it has already been reported.)

p5pRT commented 22 years ago

From @abigail

On Wed\, Aug 07\, 2002 at 12​:59​:27PM +0100\, Nicholas Clark wrote​:

So by that definition\, a v string is not a bareword\, because it has another interpretation in the grammar

So for => to quote the text v65 on the left means that it has to change its documented behaviour slightly to "bare words and v-strings"\, surely?

Perldata writes​:  
  It is often more readable to use the C\<\< => >> operator between key/value   pairs. The C\<\< => >> operator is mostly just a more visually distinctive   synonym for a comma\, but it also arranges for its left-hand operand to be   interpreted as a string--if it's a bareword that would be a legal identifier.  
It also writes about barewords​:  
  A word that has no other interpretation in the grammar will   be treated as if it were a quoted string. These are known as   "barewords".  
But clearly\, the two usages of 'bareword' conflict\, because 'if'\, 'shift'\, etc all have another interpretation in the grammar\, yet they act if they are quoted when occurring to the left of =>.

I think that the first paragraphs means "any unquoted string that would be a legal indentifier". And 'v65' is a legal identifier.

[this doesn't stop it being a bug\, just that the bug is now that what is documented is not what is desired]

I think that what is documented is what is desired (that is\, autoquoting v-strings)\, although the documentation could be phrased a bit more carefully.

Perlop says​:

  The => digraph is mostly just a synonym for the comma operator.   It's useful for documenting arguments that come in pairs. As of   release 5.001\, it also forces any word to the left of it to be   interpreted as a string.

without further specifying what a 'word' is.

There's more wierdess when it comes to parsing v-strings​:

  $ perl -wle 'print v65\, v65'   No comma allowed after filehandle at -e line 1.

What does the tokenizer do here? I would expect it to print "AA".

  $ perl -wle 'print v65 v65'   Bareword found where operator expected at -e line 1\, near "v65 v65"  
Wasn't this "technically not a bareword"? ;-)  
  $ perl -wle 'print A "A"'   Name "main​::A" used only once​: possible typo at -e line 1.   print() on unopened filehandle A at -e line 1.  
Expected.  
  $ perl -wle 'print A v65'   Can't locate object method "A" via package "v65"

???

Sean Burke often says "Vstrings must die" on #perl\, and I'm beginning to see why.

Abigail

p5pRT commented 22 years ago

From @JohnPeacock

Nicholas Clark wrote​:

So by that definition\, a v string is not a bareword\, because it has another interpretation in the grammar

Not exactly. The tokenizer parses through the source and decides that the collection of characters 'v65' must be processed by toke.c​:scan_num() which transforms the input characters into the sequence \65.

So for => to quote the text v65 on the left means that it has to change its documented behaviour slightly to "bare words and v-strings"\, surely?

No\, it would have to mean that we could

1) detect that a scalar contained a v-string 2) convert a v-string back into the characters it originally was.

Currently\, we cannot do 1) and 2) is non-determinable. For an example of the latter\, consider the following​:

  $vs1 = v1.2.3;   $vs2 = 1.2.3;

Both of these will be stored as \1\2\3\, so there is no way to know whether the original had the leading 'v' or not.

[this doesn't stop it being a bug\, just that the bug is now that what is documented is not what is desired]

I think that the fact that => doesn't autoquote vstrings is a bug (and a hard to fix one.) (I even think it has already been reported.)

I'd agree - I'd like it to autoquote (would be) v strings.

Why does this work?

$ perl -wle 'sub v65 {print "Here"}; v65' Here

Surely that v65 is a constant ('A') in void context?

No\, technically it is a \65 and it the name of a subroutine (which just happens to look like an 'A'). That line is equivalent to

$ perl -wle 'sub v65 {print "Here"}; v65()'

although B​::Deparse doesn't display that

$ perl -MO=Deparse -e 'sub v65 {print "Here"}; v65' sub v65 {   print 'Here'; } v65 ; -e syntax OK

presumably because in this context\, the v65 is a sub.

$ perl -lwe 'sub v65 {print "Here"}; v65\,' Here $ perl -lwe 'sub v65 {print "Here"}; v65\,""' Useless use of a constant in void context at -e line 1. Here

Here is how that is parsed​:

$ perl -MO=Deparse -e 'sub v65 {print "Here"}; v65\,""' sub v65 {   print 'Here'; } v65()\, '???'; -e syntax OK

$ perl -lwe 'sub v65 {print "Here"}; v65 => ""' Useless use of a constant in void context at -e line 1. Useless use of a constant in void context at -e line 1.

Here is how _that_ is parsed​:

$ perl -MO=Deparse -e 'sub v65 {print "Here"}; v65 => ""' sub v65 {   print 'Here'; } '???'\, '???'; -e syntax OK

So you can see that the => operator is\, in fact\, quoting the left term. The problem is that it no longer is the same string as it was before scan_num() altered it.

Here's another data point​:

$ perl -lwe 'v65; sub v65 {print "Here"}; v65' Useless use of a constant in void context at -e line 1. Here

As you can guess\, the warning refers only to the first instance of "v65" and the second is a function call.

This is even more amazing​:

$ perl -lwe 'v15; sub v15 {print "Here"}; v15' Useless use of a constant in void context at -e line 1. Here

This is completely consistent; we can name a sub anything that resolves to a string\, even if it is not what we would consider to be a name (in this case the printer control character SI).

HTH

John

-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4720 Boston Way Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

p5pRT commented 22 years ago

From @lizmat

At 02​:16 PM 8/7/02 +0200\, I wrote​:

To add to this misery\, something I ran into months\, if not years ago (and these posts triggered that memory)​: perl -e 'my %hash = (Module​:: => 1); print keys %hash'

Or to show this better​:

perl -MO=Deparse -e 'my %hash = (Module​:: => 1)' my(%hash) = ('Module'\, 1); -e syntax OK

Liz

p5pRT commented 22 years ago

From @nwc10

On Wed\, Aug 07\, 2002 at 10​:14​:31AM -0400\, John Peacock wrote​:

Nicholas Clark wrote​:

So by that definition\, a v string is not a bareword\, because it has another interpretation in the grammar

Not exactly. The tokenizer parses through the source and decides that the collection of characters 'v65' must be processed by toke.c​:scan_num() which transforms the input characters into the sequence \65.

The tokenizer scares me. But not as much as the regexp engine. I try to avoid both\, so I don't know much about how they work.

So for => to quote the text v65 on the left means that it has to change its documented behaviour slightly to "bare words and v-strings"\, surely?

No\, it would have to mean that we could

1) detect that a scalar contained a v-string 2) convert a v-string back into the characters it originally was.

Currently\, we cannot do 1) and 2) is non-determinable. For an example of the latter\, consider the following​:

$vs1 = v1\.2\.3;
$vs2 = 1\.2\.3;

Both of these will be stored as \1\2\3\, so there is no way to know whether the original had the leading 'v' or not.

Yes. Also v065 is indistinguishable from v65

I'd agree - I'd like it to autoquote (would be) v strings.

Why does this work?

$ perl -wle 'sub v65 {print "Here"}; v65' Here

Surely that v65 is a constant ('A') in void context?

No\, technically it is a \65 and it the name of a subroutine (which just happens to look like an 'A'). That line is equivalent to

$ perl -wle 'sub v65 {print "Here"}; v65()'

although B​::Deparse doesn't display that

I don't think you're correct on this point.

$ perl -lwe 'sub v65 {}; print grep {/65/} keys %​::' v65

It really does appear to be named 'v65' not 'A'

Alternative demo​:

$ perl -lwe 'sub v65 {print "Here"}; bless ($a = []); $b="v65"; $a->$b' Here

p5pRT commented 22 years ago

From @JohnPeacock

Nicholas Clark wrote​:

I don't think you're correct on this point.

$ perl -lwe 'sub v65 {}; print grep {/65/} keys %​::' v65

It really does appear to be named 'v65' not 'A'

Oops\, you're right. So it shows that the tokenizer is sensitive to context; if the sub v65 has already been defined\, then when it is scanning for a bareword 'v65' later on\, it takes it as the function name\, rather than passing it through scan_num().

John

-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4720 Boston Way Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

p5pRT commented 22 years ago

From @hvds

John Peacock \jpeacock@&#8203;rowman\.com wrote​: :Nicholas Clark wrote​: :> So for => to quote the text v65 on the left means that it has to change its :> documented behaviour slightly to "bare words and v-strings"\, surely? : :No\, it would have to mean that we could : :1) detect that a scalar contained a v-string :2) convert a v-string back into the characters it originally was. : :Currently\, we cannot do 1) and 2) is non-determinable. For an example of the :latter\, consider the following​: : : $vs1 = v1.2.3; : $vs2 = 1.2.3; : :Both of these will be stored as \1\2\3\, so there is no way to know whether the :original had the leading 'v' or not.

This is clearly the first problem that needs fixing. I think it should be possible to define a new OP for a vstring constant that stores the numeric and original string values\, and let the optimiser convert that to the normal form. That leaves open the opportunity to intercept it (eg when you see a following =>) and grab back the original string.

Hugo

p5pRT commented 22 years ago

From @JohnPeacock

hv@​crypt.org wrote​:

:Both of these will be stored as \1\2\3\, so there is no way to know whether the :original had the leading 'v' or not.

This is clearly the first problem that needs fixing. I think it should be possible to define a new OP for a vstring constant that stores the numeric and original string values\, and let the optimiser convert that to the normal form. That leaves open the opportunity to intercept it (eg when you see a following =>) and grab back the original string.

Patches gratefully accepted! ;~)

Seriously\, I have pounded my head against this for almost a year now\, trying one thing and then another\, as part of my version object patches. I have code that works just dandy\, as long as I can somehow tell that a given SV is a v-string. Depending on just how I do that\, various other things break. The current incarnation creates it as a dual-var and that breaks bitwise operations; for some reason the bitwise ops use the private IV slot instead of the public PV slot.

The thing that always gets me in the end is that I can do whatever I want during the tokenizing phase\, but when the assignment actually occurs\, Perl does a shallow copy and all of my hard work is for naught! I even went so far as to make v-strings magical\, but even that doesn't get copied during the assignment.

If we could simply make the assignment be a clone\, rather than a copy\, we could store the v-string notation and the original notation in the same PV slot and use the OOK hack to display the v-string encoding by default. There is still the problem of trying to come up with a flag to clue in that the SV contains a v-string...

John

-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4720 Boston Way Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

p5pRT commented 22 years ago

From @JohnPeacock

hv@​crypt.org wrote​:

This is clearly the first problem that needs fixing. I think it should be possible to define a new OP for a vstring constant that stores the numeric and original string values\, and let the optimiser convert that to the normal form. That leaves open the opportunity to intercept it (eg when you see a following =>) and grab back the original string.

OK\, the patch below (NOT TO BE APPLIED) will preserve whatever string was originally converted into a v-string\, by storing it in the same PV and using the OOK offset hack. Sadly\, Perl is too smart and when this SV is assigned to any scalar\, just the v-string portion is copied over.

But I still don't have any ideas for flagging this SV as a v-string. Perhaps I need to go back to the magic 'V' idea and changing sv_setsv_flags to do my dirty work and copy the whole PV and re-chop it.

John

Inline Patch ```diff --- util.c.orig Thu Jul 18 15:28:10 2002 +++ util.c Wed Aug 7 22:42:20 2002 @@ -4063,6 +4063,8 @@ Perl_new_vstring(pTHX_ char *s, SV *sv) { char *pos = s; + char *start = s; + SV *tmpsv = NEWSV(92,5); if (*pos == 'v') pos++; /* get past 'v' */ while (isDIGIT(*pos) || *pos == '_') pos++; @@ -4073,7 +4075,7 @@ if (*s == 'v') s++; /* get past 'v' */ - ```

sv_setpvn(sv, "", 0); + sv_setpvn(tmpsv\, ""\, 0);

  for (;;) {   rev = 0; @​@​ -4100\,9 +4102\,9 @​@​   #endif   /* Append native character for the rev point */   tmpend = uvchr_to_utf8(tmpbuf\, rev); -   sv_catpvn(sv\, (const char*)tmpbuf\, tmpend - tmpbuf); +   sv_catpvn(tmpsv\, (const char*)tmpbuf\, tmpend - tmpbuf);   if (!UNI_IS_INVARIANT(NATIVE_TO_UNI(rev))) -   SvUTF8_on(sv); +   SvUTF8_on(tmpsv);   if ( (*pos == '.' || *pos == '_') && isDIGIT(pos[1]))   s = ++pos;   else { @​@​ -4112\,8 +4114\,11 @​@​   while (isDIGIT(*pos) )   pos++;   } - SvPOK_on(sv); - SvREADONLY_on(sv); + SvPOK_on(tmpsv); + sv_setpvn(sv\, (const char*)start\, pos-start); /* original string */ + sv_catpvn(sv\, (const char*)"\0"\,1); /* terminator */ + sv_catsv(sv\, tmpsv); /* v-string */ + sv_chop(sv\, SvPVX(sv)+(pos-start+1) ); /* only show v-string */   }   return s;   }

p5pRT commented 22 years ago

From nick.ing-simmons@elixent.com

John Peacock \jpeacock@&#8203;rowman\.com writes​:

Nicholas Clark wrote​:

So by that definition\, a v string is not a bareword\, because it has another interpretation in the grammar

Not exactly. The tokenizer parses through the source and decides that the collection of characters 'v65' must be processed by toke.c​:scan_num() which transforms the input characters into the sequence \65.

IMHO it should NOT convert /^v\d+[^\.]/ into a vstring at that point. If it is followed by a '.' then sure. If not leave it as an identifier. Then much later if we find an identifier when we want a "string" (or a string would do) - i.e. where we would winge about barewords\, and if identifier matches /^v\d+$/ we then do the scan_num() thing on it.

1) detect that a scalar contained a v-string

if (SvTYPE(sv) == SVt_VV) anyone ?

Another way to handle the things would be to extend the _grammar_ to have strings and vstrings as distinct. That way the distinction is in lex value in parse tree not the representation.

2) convert a v-string back into the characters it originally was.

A VV could have two pointer slots - string value and original.

Currently\, we cannot do 1) and 2) is non-determinable. For an example of the latter\, consider the following​:

$vs1 = v1.2.3; $vs2 = 1.2.3;

Both of these will be stored as \1\2\3\, so there is no way to know whether the original had the leading 'v' or not.

[this doesn't stop it being a bug\, just that the bug is now that what is documented is not what is desired]

I think that the fact that => doesn't autoquote vstrings is a bug (and a hard to fix one.) (I even think it has already been reported.)

I'd agree - I'd like it to autoquote (would be) v strings.

Why does this work?

$ perl -wle 'sub v65 {print "Here"}; v65' Here

Surely that v65 is a constant ('A') in void context?

No\, technically it is a \65 and it the name of a subroutine (which just happens to look like an 'A'). That line is equivalent to

$ perl -wle 'sub v65 {print "Here"}; v65()'

although B​::Deparse doesn't display that

$ perl -MO=Deparse -e 'sub v65 {print "Here"}; v65' sub v65 { print 'Here'; } v65 ; -e syntax OK

presumably because in this context\, the v65 is a sub.

$ perl -lwe 'sub v65 {print "Here"}; v65\,' Here $ perl -lwe 'sub v65 {print "Here"}; v65\,""' Useless use of a constant in void context at -e line 1. Here

Here is how that is parsed​:

$ perl -MO=Deparse -e 'sub v65 {print "Here"}; v65\,""' sub v65 { print 'Here'; } v65()\, '???'; -e syntax OK

$ perl -lwe 'sub v65 {print "Here"}; v65 => ""' Useless use of a constant in void context at -e line 1. Useless use of a constant in void context at -e line 1.

Here is how _that_ is parsed​:

$ perl -MO=Deparse -e 'sub v65 {print "Here"}; v65 => ""' sub v65 { print 'Here'; } '???'\, '???'; -e syntax OK

So you can see that the => operator is\, in fact\, quoting the left term. The problem is that it no longer is the same string as it was before scan_num() altered it.

Here's another data point​:

$ perl -lwe 'v65; sub v65 {print "Here"}; v65' Useless use of a constant in void context at -e line 1. Here

As you can guess\, the warning refers only to the first instance of "v65" and the second is a function call.

This is even more amazing​:

$ perl -lwe 'v15; sub v15 {print "Here"}; v15' Useless use of a constant in void context at -e line 1. Here

This is completely consistent; we can name a sub anything that resolves to a string\, even if it is not what we would consider to be a name (in this case the printer control character SI).

HTH

John -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From arthur@contiller.se

On torsdag\, augusti 8\, 2002\, at 04​:50 \, Nick Ing-Simmons wrote​:

if (SvTYPE(sv) == SVt_VV) anyone ?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

Arthur

p5pRT commented 22 years ago

From @JohnPeacock

Nick Ing-Simmons wrote​:

IMHO it should NOT convert /^v\d+[^\.]/ into a vstring at that point. If it is followed by a '.' then sure. If not leave it as an identifier. Then much later if we find an identifier when we want a "string" (or a string would do) - i.e. where we would winge about barewords\, and if identifier matches /^v\d+$/ we then do the scan_num() thing on it.

The tokenizer scans character by character through the source and branches accordingly. Any not-otherwise-identifiable bareword gets shoved through scan_num at some point and the leading 'v' branches to the new_vstring code. Numbers with at least two decimals go there as well. I don't know how practical it would be to bail out of scan_num or new_vstring once we discover that there is no decimal place.

1) detect that a scalar contained a v-string

if (SvTYPE(sv) == SVt_VV) anyone ?

v-strings are disliked enough by enough people that a new SvTYPE is unlikely to garner much favor. Besides\, if I remember correctly there may not be any flag bits left to create a new SvTYPE at all. :\~(

2) convert a v-string back into the characters it originally was.

A VV could have two pointer slots - string value and original.

I think if I apply the suggestions I was given several months ago\, I can make v-strings be a new type of magic\, which gives me a second PV slot to stuff the original representation into.

John

-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4720 Boston Way Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

p5pRT commented 22 years ago

From nick@ing-simmons.net

John Peacock \jpeacock@&#8203;rowman\.com writes​:

Nick Ing-Simmons wrote​:

IMHO it should NOT convert /^v\d+[^\.]/ into a vstring at that point. If it is followed by a '.' then sure. If not leave it as an identifier. Then much later if we find an identifier when we want a "string" (or a string would do) - i.e. where we would winge about barewords\, and if identifier matches /^v\d+$/ we then do the scan_num() thing on it.

The tokenizer scans character by character through the source and branches accordingly. Any not-otherwise-identifiable bareword gets shoved through scan_num at some point and the leading 'v' branches to the new_vstring code.

And I am suggesting it shouldn't.

Numbers with at least two decimals go there as well. I don't know how practical it would be to bail out of scan_num or new_vstring once we discover that there is no decimal place.

I am suggesting it does not go near scan_num until either we see the '.' or the _parser_ (not the tokenizer) says that an identifier is not acceptable.

1) detect that a scalar contained a v-string

if (SvTYPE(sv) == SVt_VV) anyone ?

v-strings are disliked enough by enough people that a new SvTYPE is unlikely to garner much favor. Besides\, if I remember correctly there may not be any flag bits left to create a new SvTYPE at all. :\~(

2) convert a v-string back into the characters it originally was.

A VV could have two pointer slots - string value and original.

I think if I apply the suggestions I was given several months ago\, I can make v-strings be a new type of magic\, which gives me a second PV slot to stuff the original representation into.

John -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From @ysth

On Wed\, 07 Aug 2002 16​:23​:44 +0200\, liz@​dijkmat.nl wrote​:

At 02​:16 PM 8/7/02 +0200\, I wrote​:

To add to this misery\, something I ran into months\, if not years ago (and these posts triggered that memory)​: perl -e 'my %hash = (Module​:: => 1); print keys %hash'

Or to show this better​:

perl -MO=Deparse -e 'my %hash = (Module​:: => 1)' my(%hash) = ('Module'\, 1); -e syntax OK

Another gotcha​:

~/pbed $perl -wle"%hash = (it's => 'ok'); print for %hash" it​::s ok

p5pRT commented 22 years ago

From @Juerd

I ran into this problem while trying to construct a hash with keys qw(a1 b14 q4 v1). The v1 was turned into a \001. The obvious workaround was to use quotes. But they didn't look pretty.

package Filter​::QuotingComma; use strict; use Filter​::Simple;

FILTER_ONLY code => sub { s/\b(\w+)\b\s*=>/'$1'\,/g };

1;

=head1 NAME

Filter​::QuotingComma - Lets C\<\< => >> quote every bareword\, including v1\, v65\, etc

=head1 SYNOPSIS

  use Filter​::QuotingComma;   print v65 => "\n"; # prints​: v65   no Filter​::QuotingComma;   print v65 => "\n"; # prints​: A

=head1 DESCRIPTION

Perl has v-strings that let you write C\<"Hello\, world!"> as C\<v72.101.108.108.111.44.32.119.111.114.108.100.33>. Unfortunately\, this also works on the LHS of the C\<\< => >> operator.

This module really does quote all C\<\w+> barewords on the left side of every C\<\< => >>\, thus working around this problem.

=head1 CAVEATS

Probably many. See L\<Filter​::Simple>.

=head1 AUTHOR

Juerd \juerd@&#8203;juerd\.nl

=head1 SEE ALSO

http​://rt.perl.org/rt2/Ticket/Display.html?id=16010

=cut

p5pRT commented 22 years ago

@gbarr - Status changed from 'new' to 'open'

p5pRT commented 21 years ago

From @jhi

I'd like to close this as a 5.8.0 bug (since I don't think we want to mess with the tokenizer in 5.8.1)\, but since the v-string changes by John Peacock are happening in bleadperl\, maybe this could be migrated to be a 5.9.0 issue?

p5pRT commented 21 years ago

From @jhi

I now migrated this from a 5.8.0 to be a 5.9.0 issue.

p5pRT commented 21 years ago

From @JohnPeacock

I have finally sussed out the way to cause strings that look like vstrings (i.e. v65) to be quoted properly by the => operator\, when used as a key in a hash. The current behavior​:

$ perl -MDevel​::Peek -e '%h =( 1.2.3 => 1); Dump \%h' SV = RV(0x806cb40) at 0x804c13c   REFCNT = 1   FLAGS = (TEMP\,ROK)   RV = 0x8060d2c   SV = PVHV(0x8065688) at 0x8060d2c   REFCNT = 2   FLAGS = (SHAREKEYS)   IV = 1   NV = 0   ARRAY = 0x805d570 (0​:7\, 1​:1)   hash quality = 100.0%   KEYS = 1   FILL = 1   MAX = 7   RITER = -1   EITER = 0x0   Elt "\1\2\3" HASH = 0xf926da4f   SV = IV(0x8063c0c) at 0x804c01c   REFCNT = 1   FLAGS = (IOK\,pIOK)   IV = 1

means that you cannot print out the keys in a sensible manner. It is also not what [most] people would expect to be stored as a key.

The attached patch (vs. bleadperl) restores the way it used to work with Perl \< 5.6.0​:

$ ./perl -Ilib -MDevel​::Peek -e '%h =( 1.2.3 => 1); Dump \%h' SV = RV(0x818c408) at 0x8171d70   REFCNT = 1   FLAGS = (TEMP\,ROK)   RV = 0x818052c   SV = PVHV(0x817ee40) at 0x818052c   REFCNT = 2   FLAGS = (SHAREKEYS)   IV = 1   NV = 0   ARRAY = 0x817a200 (0​:7\, 1​:1)   hash quality = 100.0%   KEYS = 1   FILL = 1   MAX = 7   RITER = -1   EITER = 0x0   Elt "1.2.3" HASH = 0x2b04db58   SV = IV(0x8181d04) at 0x8171ce0   REFCNT = 1   FLAGS = (IOK\,pIOK)   IV = 1

All tests pass; suggestions of where to add a regression test for this code gratefully accepted...

NOTE​: I have not found out where to patch to handle the similar case of​:

  perl -we 'print v65 => "\n"'

which will _not_ autoquote the v-string (and hence will still print "A\n"...

John

p5pRT commented 21 years ago

From @JohnPeacock

magic_vstring_blead.diff ```diff Index: toke.c =================================================================== --- toke.c (revision 16195) +++ toke.c (working copy) @@ -3677,7 +3677,7 @@ case '5': case '6': case '7': case '8': case '9': s = scan_num(s, &yylval); DEBUG_T( { PerlIO_printf(Perl_debug_log, - "### Saw number in '%s'\n", s); + "### Saw number before '%s'\n", s); } ); if (PL_expect == XOPERATOR) no_op("Number",s); @@ -7574,6 +7574,9 @@ vstring: sv = NEWSV(92,5); /* preallocate storage space */ s = scan_vstring(s,sv); + DEBUG_T( { PerlIO_printf(Perl_debug_log, + "### Saw v-string before '%s'\n", s); + } ); break; } Index: hv.c =================================================================== --- hv.c (revision 16195) +++ hv.c (working copy) @@ -753,6 +753,7 @@ bool is_utf8; int flags = 0; char *keysave; + MAGIC* mg; if (!hv) return 0; @@ -782,7 +783,15 @@ } } - keysave = key = SvPV(keysv, klen); + if ( SvMAGICAL(keysv) && + (mg = mg_find(keysv,PERL_MAGIC_vstring)) ) { + /* key is a v-string, so must take original string rep */ + keysave = key = mg->mg_ptr; + klen = mg->mg_len; + } + else { + keysave = key = SvPV(keysv, klen); + } is_utf8 = (SvUTF8(keysv) != 0); if (is_utf8) { ```
p5pRT commented 21 years ago

From @abigail

On Tue\, Jul 08\, 2003 at 01​:37​:32AM -0000\, John Peacock wrote​:

I have finally sussed out the way to cause strings that look like vstrings (i.e. v65) to be quoted properly by the => operator\, when used as a key in a hash.

*Cheer*

Abigail

p5pRT commented 21 years ago

From @JohnPeacock

John Peacock wrote​:

I have finally sussed out the way to cause strings that look like vstrings (i.e. v65) to be quoted properly by the => operator\, when used as a key in a hash. The current behavior​:

Attached are tests of the corrected behavior.

John

p5pRT commented 21 years ago

From @JohnPeacock

magic_vstring_blead_tests.diff ```diff Index: t/op/hashassign.t =================================================================== --- t/op/hashassign.t (revision 16195) +++ t/op/hashassign.t (working copy) @@ -8,7 +8,7 @@ # use strict; -plan tests => 206; +plan tests => 224; my @comma = ("key", "value"); @@ -273,3 +273,29 @@ } +# Test v-strings (or things that look like them) as keys +# [perl #16010] +{ + my %h = ( + # keys that probably were not intended to be v-strings + v65 => "A?", v7 =>"Beep", v63 => "? indeed", + # keys that might be v-strings + 1.2.3 => "first", 2.3.4 => "second", 3.4.5 => "third", + # keys using explicit v-string notation + v5.6.2 => 'sarathy', v5.8.1 => 'jarkko', v5.9.0 => 'hugo', + ); + # vs. deliberately quoted keys + my %h2 = ( + # keys that probably were not intended to be v-strings + 'v65' => "A?", 'v7' =>"Beep", 'v63' => "? indeed", + # keys that might be v-strings + '1.2.3' => "first", '2.3.4' => "second", '3.4.5' => "third", + # keys using explicit v-string notation + 'v5.6.2' => 'sarathy', 'v5.8.1' => 'jarkko', 'v5.9.0' => 'hugo', + ); + foreach my $key ( keys %h2 ) + { + ok ( exists $h{$key}, "same key [$key] in both hashes" ); + ok ( $h{$key} eq $h2{$key}, "same value [$h{$key}]for key" ); + } +} ```
p5pRT commented 21 years ago

From @JohnPeacock

John Peacock wrote​:

NOTE​: I have not found out where to patch to handle the similar case of​:

perl \-we 'print v65 => "\\n"'

which will _not_ autoquote the v-string (and hence will still print "A\n"...

Can I get a show of hands how important people feel that this other "autoquoting by =>" be fixed as well? The issue is really one of semantics\, since v-strings are strings\, they already _are_ quoted. It was only in the hash key case where this was probably not what the user intended.

Internally\, at least for the hash key case\, the way that '=>' autoquotes is to ensure that SvPV is called on the sv prior to its use as a hash key. There is no special handling of '=>' as such (which is why it took me so long to find out where to apply the patch). Consequently\, the occasional use of '=>' outside of a hash assignment is exactly the same as using of a '\,' in the same location.

I'm inclined to not fix that case. Objections? Counter arguments? Actual working code that broke because of v-strings???

John

p5pRT commented 21 years ago

From @jhi

I'd like to get a show of hands\, too.

How many people think that John's latest patch (http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-07/msg00358.html) and the test patch (http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-07/msg00397.html) should go into 5.8.1? (http​://bugs6.perl.org/rt2/Ticket/Display.html?id=16010)

I'm waffling since it obviously makes vstrings more sane\, but then on the other hand it changes behaviour (with us since 5.6.0)\, but then on the third hand the old behaviour could be considered to be broken\, but then on the fourth hand ... I looked at the Camel Mk 3 but vstrings and the "fat comma" are never considered at the same time.

-- Jarkko Hietaniemi \jhi@&#8203;iki\.fi http​://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

p5pRT commented 21 years ago

From @abigail

On Tue\, Jul 08\, 2003 at 12​:27​:46PM -0000\, John Peacock wrote​:

John Peacock wrote​:

NOTE​: I have not found out where to patch to handle the similar case of​:

perl \-we 'print v65 => "\\n"'

which will _not_ autoquote the v-string (and hence will still print "A\n"...

Can I get a show of hands how important people feel that this other "autoquoti by =>" be fixed as well? The issue is really one of semantics\, since v-string are strings\, they already _are_ quoted. It was only in the hash key case wher this was probably not what the user intended.

*Raises hand*

V-string usage is rare. I'd be surprised if 10% of the Perl users knew anything about v-strings.

Internally\, at least for the hash key case\, the way that '=>' autoquotes is to ensure that SvPV is called on the sv prior to its use as a hash key. There is no special handling of '=>' as such (which is why it took me so long to find o where to apply the patch). Consequently\, the occasional use of '=>' outside o a hash assignment is exactly the same as using of a '\,' in the same location.

I'm inclined to not fix that case. Objections? Counter arguments? Actual working code that broke because of v-strings???

I've seen posts on Perlmonks; people stunned that 'v1 => 1' doesn't act the same as 'w1 => 1'. The bug report that led to this fix was because of reports on Perlmonks.

Abigail

p5pRT commented 21 years ago

From @abigail

On Tue\, Jul 08\, 2003 at 12​:46​:00PM -0000\, Jarkko Hietaniemi wrote​:

I'd like to get a show of hands\, too.

How many people think that John's latest patch (http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-07/msg00358.html) and the test patch (http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-07/msg00397.html) should go into 5.8.1? (http​://bugs6.perl.org/rt2/Ticket/Display.html?id=16010)

*Raises hand* I'm all in favour.

(BTW\, has Larry mentioned anything about v-strings in Perl6? Is that in an apocalypse to be?)

Abigail

p5pRT commented 21 years ago

From @tamias

On Mon\, Jul 07\, 2003 at 08​:09​:46AM -0700\, John Peacock wrote​:

The attached patch (vs. bleadperl) restores the way it used to work with Perl \< 5.6.0​:

$ ./perl -Ilib -MDevel​::Peek -e '%h =( 1.2.3 => 1); Dump \%h'

Elt "1\.2\.3" HASH = 0x2b04db58

I don't understand. When did (1.2.3 => 1) ever result in ('1.2.3'\, 1)??

Prior to 5.6.0\, it would result in ('1.23' => 1)\, because the second period was parsed as the concatenation operator.

Ronald

p5pRT commented 21 years ago

From @tamias

On Tue\, Jul 08\, 2003 at 05​:32​:04AM -0700\, John Peacock wrote​:

John Peacock wrote​:

NOTE​: I have not found out where to patch to handle the similar case of​:

perl -we 'print v65 => "\n"'

which will _not_ autoquote the v-string (and hence will still print "A\n"...

Can I get a show of hands how important people feel that this other "autoquoting by =>" be fixed as well? The issue is really one of semantics\, since v-strings are strings\, they already _are_ quoted. It was only in the hash key case where this was probably not what the user intended.

Internally\, at least for the hash key case\, the way that '=>' autoquotes is to ensure that SvPV is called on the sv prior to its use as a hash key.
There is no special handling of '=>' as such (which is why it took me so long to find out where to apply the patch). Consequently\, the occasional use of '=>' outside of a hash assignment is exactly the same as using of a '\,' in the same location.

I didn't really follow that :) but I think C\<\< v65 => 1 >> and C\<\< 'v65'\, 1

should be equivalent. Hashes should have nothing to do with it; it's the semantics of the => operator.

Ronald

p5pRT commented 21 years ago

From @JohnPeacock

From​: rjk@​linguist.Thayer.dartmouth.edu

I don't understand. When did (1.2.3 => 1) ever result in ('1.2.3'\, 1)??

Prior to 5.6.0\, it would result in ('1.23' => 1)\, because the second period was parsed as the concatenation operator.

Yes\, you're right. That was a bad example. I don't know whether we want to preserve that particular behavior.

If you look at the ticket #16010\, you'll see the reporter was using hash keys like this​:

%h = (v23 => 'help"\, v65 =>"Not A");

and Perl was helpfully treating the bare v23 as \2\3 when it created the hash key.

John

p5pRT commented 21 years ago

From @hvds

Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote​: :I'd like to get a show of hands\, too. : :How many people think that John's latest patch[es] should go into 5.8.1?

For me\, the most important thing is that C\<\< v1 => 1 >> should return C\< 'v1'\, 1 >\, and I'd certainly be tempted to put any fix for that into 5.8.1.

For the patches currently under consideration I'm less sure​: I don't think C\<\< 1.2.3 => 1 >> ever returned C\< '1.2.3'\, 1 > in earlier perls\, and I have no immediate feel for which interpretation would cause the least surprise.

Hugo

p5pRT commented 21 years ago

From @JohnPeacock

From​: rjk@​linguist.Thayer.dartmouth.edu

I didn't really follow that :) but I think C\<\< v65 => 1 >> and C\<\< 'v65'\, 1

should be equivalent. Hashes should have nothing to do with it; it's the semantics of the => operator.

Except that I don't think it really isn't about the => operator\, as far as I can determine. There is nothing special in the parser about the string '=>' with respect to its use in initializing a hash key\,value pair. It turns out that _all_ potential hash keys are stringified in exactly the same fashion (prior to my patch that is).

I will have to look at the code to see whether the few instances where the tokenizer actually looks for the string '=>' have any bearing on the use of '=>' outside of the one context I have fixed already\, e.g. hash keys. The docs that say the => will autoquote the left hand term are really discussion the apparent behavior\, not\, again as far as I can tell\, any explicit coding which enforces that.

John

p5pRT commented 21 years ago

From @jhi

On Tue\, Jul 08\, 2003 at 06​:32​:55PM +0100\, hv@​crypt.org wrote​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote​: :I'd like to get a show of hands\, too. : :How many people think that John's latest patch[es] should go into 5.8.1?

For me\, the most important thing is that C\<\< v1 => 1 >> should return C\< 'v1'\, 1 >\, and I'd certainly be tempted to put any fix for that into 5.8.1.

I though that the patch in question did just that...? Or have I been reading with my rose-tinted "wishful thinking" glasses on?

For the patches currently under consideration I'm less sure​: I don't think C\<\< 1.2.3 => 1 >> ever returned C\< '1.2.3'\, 1 > in earlier perls\, and I have no immediate feel for which interpretation would cause the least surprise.

Hugo

-- Jarkko Hietaniemi \jhi@&#8203;iki\.fi http​://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

p5pRT commented 21 years ago

From @abigail

On Tue\, Jul 08\, 2003 at 05​:29​:56PM -0000\, John Peacock wrote​:

From​: rjk@​linguist.Thayer.dartmouth.edu

I didn't really follow that :) but I think C\<\< v65 => 1 >> and C\<\< 'v65'\, 1

should be equivalent. Hashes should have nothing to do with it; it's the semantics of the => operator.

Except that I don't think it really isn't about the => operator\, as far as I can determine. There is nothing special in the parser about the string '=>' with respect to its use in initializing a hash key\,value pair. It turns out that _all_ potential hash keys are stringified in exactly the same fashion (prior to my patch that is).

I will have to look at the code to see whether the few instances where the tokenizer actually looks for the string '=>' have any bearing on the use of '=>' outside of the one context I have fixed already\, e.g. hash keys. The docs that say the => will autoquote the left hand term are really discussion the apparent behavior\, not\, again as far as I can tell\, any explicit coding which enforces that.

I don't think so. I've always gotten the impression that the autoquoting behaviour of => was intentional\, and not a happistance. If the code doesn't enforce the documented behaviour\, then\, IMO\, the code is wrong. Specially since in the pre-vstring era\, (v65 => 1) was equivalent to ('v65'\, 1).

Abigail

p5pRT commented 21 years ago

From @hvds

Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote​: :On Tue\, Jul 08\, 2003 at 06​:32​:55PM +0100\, hv@​crypt.org wrote​: :> For me\, the most important thing is that C\<\< v1 => 1 >> should return :> C\< 'v1'\, 1 >\, and I'd certainly be tempted to put any fix for that into :> 5.8.1. : :I though that the patch in question did just that...? Or have I been :reading with my rose-tinted "wishful thinking" glasses on?

I'm sorry\, that was me confusing myself​: I looked at the 1.2.3 example\, and at the final message​: :NOTE​: I have not found out where to patch to handle the similar case of​: : perl -we 'print v65 => "\n"'

.. and drew the wrong conclusions about what was being said.

And I'm further confused\, because I had thought (as\, clearly\, have others) that the autoquoting beaviour was nothing to do with hashes and everything to do with being on the LHS of the C\<\< => >> operator.

To try and clarify a bit​: I feel that it is currently a bug that C\<\< v1 => 1 >> ever gives a vstring rather than the literal 'v1'\, and I'd be delighted to see a fix go into 5.8.1. I'm less sure of the value of the supplied patch if it fixes some cases but not others. I feel that would be in danger of just increasing the confusion.

Hugo

p5pRT commented 21 years ago

From @JohnPeacock

Steve Grazzini wrote​:

On Tue\, Jul 08\, 2003 at 01​:29​:51PM -0400\, John Peacock wrote​:

From​: rjk@​linguist.Thayer.dartmouth.edu

it's the semantics of the => operator.

Except that I don't think it really isn't about the => operator\, as far as I can determine.

The "=>" forces[*] a bareword to be interpreted as a string and not as a subroutine call\, keyword\, etc.

Yes\, I saw that code\, but I was not seeing it actually fall through that test. I am now (I can only blame lack of sleep). I can change that behavior\, once someone tells me what behavior we want to achieve.

But this only applies to barewords and C\ is no longer interpreted as a bareword.

Bingo.

FWIW\, I would think that these should use the v-string as the hash key​:

OK\, can you tell me how you are distinguishing these cases?

%hash = \( v5\.8\.1 => "anydaynow" \);

more than one decimal?

$key = v42;  $hash\{$key\}\+\+;

Already works that way. I didn't fix the hash access but rather just the hash assign/creation. Of course\, that means I need to fix the hash access to be consistent (whatever that means). :\~(

But that these should use "v42".

%hash = \( v42 => 0 \);

The patch does this now.

$hash\{v42\}\+\+;

This is a problem case. I think it is going to be indistinguishable from the second case you mentioned\, at the point in the code where I can check. In other words\, I have already lost whether the key was directly entered or is another variable\, since the tokenizer will have already created a new SV for me. Of course\, that SV will be anonymous... I'll have to look at the code.

I would rather have one true interpretation\, rather than two competing ones\, if it all the same to you... ;~)

John

p5pRT commented 21 years ago

From ben.goldberg@hotpop.com

Abigail wrote​: [snip]

(BTW\, has Larry mentioned anything about v-strings in Perl6? Is that in an apocalypse to be?)

Although this isn't by larry\, google did find the following bit of docu for vstrings in perl6​:

http​://nntp.x.perl.org/group/perl.perl6.documentation/515?show_headers=1

-- $a=24;split//\,240513;s/\B/ => /for@​@​=qw(ac ab bc ba cb ca );{push(@​b\,$a)\,($a-=6)^=1 for 2..$a/6x--$|;print "$@​[$a%6 ]\n";((6\<=($a-=6))?$a+=$_[$a%6]-$a%6​:($a=pop @​b))&&redo;}

p5pRT commented 21 years ago

From @JohnPeacock

Jarkko Hietaniemi wrote​:

For me\, the most important thing is that C\<\< v1 => 1 >> should return C\< 'v1'\, 1 >\, and I'd certainly be tempted to put any fix for that into 5.8.1.

I though that the patch in question did just that...? Or have I been reading with my rose-tinted "wishful thinking" glasses on?

The patch as it stands now (this minute\, I'm compiling a new version as we "speak") does this​:

Prior to its use as a hash key\, the sv is checked to see if it is a v-string. If it is\, the normal stringification (via SvSV) is ignored and the original string representation is returned. Currently\, this is true for _any_ vstrings​:

  v65   1.2.3 or v5.6

I am testing a revised patch which only "quotes" the first of that list.

As I said before\, I still need to figure out how to make => quote the first one of those again anywhere else in code. There is a section of code which makes sure that barewords preceeding => get stringified (quoted)\, but since v65 is _not_ a bareword\, that never fires.

It looks like what I am going to have to do is look ahead in scan_vstring() and see if the next token will be a fat arrow and then return the original quoted string instead of a vstring. But again\, only if there is no decimal at all.

Is that clearer now?

John

p.s. any pointers on reading forward in the tokenizer would be greatly appreciated...

p5pRT commented 21 years ago

From nick.ing-simmons@elixent.com

\hv@&#8203;crypt\.org writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote​: ​:I'd like to get a show of hands\, too. ​: ​:How many people think that John's latest patch[es] should go into 5.8.1?

For me\, the most important thing is that C\<\< v1 => 1 >> should return C\< 'v1'\, 1 >\, and I'd certainly be tempted to put any fix for that into 5.8.1.

I agree there.

For the patches currently under consideration I'm less sure​: I don't think C\<\< 1.2.3 => 1 >> ever returned C\< '1.2.3'\, 1 > in earlier perls\, and I have no immediate feel for which interpretation would cause the least surprise.

The olde 1.2.3 becomes 1.23 via (1.2).3 is not worth recovering.

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 21 years ago

From nick.ing-simmons@elixent.com

John Peacock \jpeacock@&#8203;rowman\.com writes​:

From​: rjk@​linguist.Thayer.dartmouth.edu

I didn't really follow that :) but I think C\<\< v65 => 1 >> and C\<\< 'v65'\, 1

should be equivalent. Hashes should have nothing to do with it; it's the semantics of the => operator.

Except that I don't think it really isn't about the => operator\, as far as I can determine. There is nothing special in the parser about the string '=>' with respect to its use in initializing a hash key\,value pair. It turns out that _all_ potential hash keys are stringified in exactly the same fashion (prior to my patch that is).

I will have to look at the code to see whether the few instances where the tokenizer actually looks for the string '=>' have any bearing on the use of '=>' outside of the one context I have fixed already\, e.g. hash keys. The docs that say the => will autoquote the left hand term are really discussion the apparent behavior\, not\, again as far as I can tell\, any explicit coding which enforces that.

Tk code is makes heavy use of => in things like

  $widget->method( -keyword => value\, -attrib => value );

There are no hashes there. Admittedly for _Tk_ there is usually the leading '-' but using the => to pass key/value pairs to sub calls is common.

John -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 21 years ago

From nick.ing-simmons@elixent.com

John Peacock \jpeacock@&#8203;rowman\.com writes​:

But that these should use "v42".

%hash = \( v42 => 0 \);

The patch does this now.

$hash\{v42\}\+\+;

sub Class​::foo { my ($self\,%attrib) = @​_; }

Class->foo(v42 => 0);

Is important too ;-)

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 21 years ago

From @JohnPeacock

Jarkko Hietaniemi wrote​:

Have you tried/considered lookahead already at the vstring​: label-- lookahed to see if there are any "."\, and if not\, don't even enter scan_vstring()?

OK\, here it is! Any bare word of the form v99 followed by whitespace and => will not be passed through the v-string mungifier. All other v-strings are left alone. This patch is vs bleadperl @​ 1AM EDT and replaces the previous patch.

Jarkko\, if you want a Perl 5.8.1 specific patch\, let me know\, otherwise I'll let you work it out by yourself. If this patch is accepted\, I consider my work on v-strings complete.

John

p.s. note that I moved scan_vstring back into toke.c from util.c and took away it's POD entry. It is not necessary to expose it any longer (since I don't call it directly from the version code any more).

p5pRT commented 21 years ago

From @JohnPeacock

magic_vstring2.diff ```diff Index: toke.c =================================================================== --- toke.c (revision 16195) +++ toke.c (working copy) @@ -7958,3 +7958,90 @@ } #endif +/* +Returns a pointer to the next character after the parsed +vstring, as well as updating the passed in sv. + +Function must be called like + + sv = NEWSV(92,5); + s = scan_vstring(s,sv); + +The sv should already be large enough to store the vstring +passed in, for performance reasons. + +*/ + +char * +Perl_scan_vstring(pTHX_ char *s, SV *sv) +{ + char *pos = s; + char *start = s; + if (*pos == 'v') pos++; /* get past 'v' */ + while (isDIGIT(*pos) || *pos == '_') + pos++; + if ( *pos != '.') { + /* this may not be a v-string if followed by => */ + start = pos; + if (isSPACE(*start)) + start = skipspace(start); + if ( *start == '=' && start[1] == '>' ) + { + /* return string not v-string */ + sv_setpvn(sv,(char *)s,pos-s); + return pos; + } + } + + if (!isALPHA(*pos)) { + UV rev; + U8 tmpbuf[UTF8_MAXLEN+1]; + U8 *tmpend; + + if (*s == 'v') s++; /* get past 'v' */ + + sv_setpvn(sv, "", 0); + + for (;;) { + rev = 0; + { + /* this is atoi() that tolerates underscores */ + char *end = pos; + UV mult = 1; + while (--end >= s) { + UV orev; + if (*end == '_') + continue; + orev = rev; + rev += (*end - '0') * mult; + mult *= 10; + if (orev > rev && ckWARN_d(WARN_OVERFLOW)) + Perl_warner(aTHX_ packWARN(WARN_OVERFLOW), + "Integer overflow in decimal number"); + } + } +#ifdef EBCDIC + if (rev > 0x7FFFFFFF) + Perl_croak(aTHX_ "In EBCDIC the v-string components cannot exceed 2147483647"); +#endif + /* Append native character for the rev point */ + tmpend = uvchr_to_utf8(tmpbuf, rev); + sv_catpvn(sv, (const char*)tmpbuf, tmpend - tmpbuf); + if (!UNI_IS_INVARIANT(NATIVE_TO_UNI(rev))) + SvUTF8_on(sv); + if (*pos == '.' && isDIGIT(pos[1])) + s = ++pos; + else { + s = pos; + break; + } + while (isDIGIT(*pos) || *pos == '_') + pos++; + } + SvPOK_on(sv); + sv_magic(sv,NULL,PERL_MAGIC_vstring,(const char*)start, pos-start); + SvRMAGICAL_on(sv); + } + return s; +} + Index: util.c =================================================================== --- util.c (revision 16195) +++ util.c (working copy) @@ -3634,85 +3634,6 @@ } /* -=head1 SV Manipulation Functions - -=for apidoc scan_vstring - -Returns a pointer to the next character after the parsed -vstring, as well as updating the passed in sv. - -Function must be called like - - sv = NEWSV(92,5); - s = scan_vstring(s,sv); - -The sv should already be large enough to store the vstring -passed in, for performance reasons. - -=cut -*/ - -char * -Perl_scan_vstring(pTHX_ char *s, SV *sv) -{ - char *pos = s; - char *start = s; - if (*pos == 'v') pos++; /* get past 'v' */ - while (isDIGIT(*pos) || *pos == '_') - pos++; - if (!isALPHA(*pos)) { - UV rev; - U8 tmpbuf[UTF8_MAXLEN+1]; - U8 *tmpend; - - if (*s == 'v') s++; /* get past 'v' */ - - sv_setpvn(sv, "", 0); - - for (;;) { - rev = 0; - { - /* this is atoi() that tolerates underscores */ - char *end = pos; - UV mult = 1; - while (--end >= s) { - UV orev; - if (*end == '_') - continue; - orev = rev; - rev += (*end - '0') * mult; - mult *= 10; - if (orev > rev && ckWARN_d(WARN_OVERFLOW)) - Perl_warner(aTHX_ packWARN(WARN_OVERFLOW), - "Integer overflow in decimal number"); - } - } -#ifdef EBCDIC - if (rev > 0x7FFFFFFF) - Perl_croak(aTHX_ "In EBCDIC the v-string components cannot exceed 2147483647"); -#endif - /* Append native character for the rev point */ - tmpend = uvchr_to_utf8(tmpbuf, rev); - sv_catpvn(sv, (const char*)tmpbuf, tmpend - tmpbuf); - if (!UNI_IS_INVARIANT(NATIVE_TO_UNI(rev))) - SvUTF8_on(sv); - if (*pos == '.' && isDIGIT(pos[1])) - s = ++pos; - else { - s = pos; - break; - } - while (isDIGIT(*pos) || *pos == '_') - pos++; - } - SvPOK_on(sv); - sv_magic(sv,NULL,PERL_MAGIC_vstring,(const char*)start, pos-start); - SvRMAGICAL_on(sv); - } - return s; -} - -/* =for apidoc scan_version Returns a pointer to the next character after the parsed Index: embed.fnc =================================================================== --- embed.fnc (revision 16195) +++ embed.fnc (working copy) @@ -533,7 +533,7 @@ |I32 whileline|OP* expr|OP* block|OP* cont Ap |PERL_SI*|new_stackinfo|I32 stitems|I32 cxitems -Apd |char* |scan_vstring |char *vstr|SV *sv +Ap |char* |scan_vstring |char *vstr|SV *sv Apd |char* |scan_version |char *vstr|SV *sv Apd |SV* |new_version |SV *ver Apd |SV* |upg_version |SV *ver ```
p5pRT commented 21 years ago

From mjtg@cam.ac.uk

Nick Ing-Simmons \nick\.ing\-simmons@&#8203;elixent\.com wrote

Tk code is makes heavy use of => in things like

$widget->method( -keyword => value\, -attrib => value );

There are no hashes there. Admittedly for _Tk_ there is usually the leading '-' but using the => to pass key/value pairs to sub calls is common.

The leading '-' doesn't protect you​:

  DB\<1> Dump +{ -v65 => 1 } SV = RV(0x1b38ac) at 0x2012bc   REFCNT = 1   FLAGS = (ROK)   RV = 0x1f4ddc   SV = PVHV(0x1c5040) at 0x1f4ddc   REFCNT = 2   FLAGS = (SHAREKEYS)   IV = 1   NV = 0   ARRAY = 0x202540 (0​:7\, 1​:1)   hash quality = 150.0%   KEYS = 1   FILL = 1   MAX = 7   RITER = -1   EITER = 0x0   Elt "-A" HASH = 0x63e \<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<   SV = IV(0x1a32c0) at 0x20134c   REFCNT = 1   FLAGS = (IOK\,pIOK)   IV = 1

  DB\<2>

But fortunately John's latest patch makes this moot (I hope).

Mike Guy

p5pRT commented 21 years ago

From @JohnPeacock

Mike Guy wrote​:

The leading '-' doesn't protect you​:

Elt "\-A" HASH = 0x63e        \<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<

But fortunately John's latest patch makes this moot (I hope).

I hear and obey...

$ ./perl -Ilib -MDevel​::Peek -e 'Dump +{ -v65 => 1 }' SV = RV(0x818c560) at 0x8171f54   REFCNT = 1   FLAGS = (TEMP\,ROK)   RV = 0x8171e40   SV = PVHV(0x817ef80) at 0x8171e40   REFCNT = 2   FLAGS = (SHAREKEYS)   IV = 1   NV = 0   ARRAY = 0x8182f98 (0​:7\, 1​:1)   hash quality = 100.0%   KEYS = 1   FILL = 1   MAX = 7   RITER = -1   EITER = 0x0   Elt "-v65" HASH = 0x33cb652 \<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<   SV = IV(0x8181e5c) at 0x8171ed0   REFCNT = 1   FLAGS = (IOK\,pIOK)   IV = 1

John