Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 554 forks source link

highly illegal variable names are now accidentally legal #12172

Closed p5pRT closed 11 years ago

p5pRT commented 12 years ago

Migrated from rt.perl.org#113620 (status was 'resolved')

Searchable as RT113620$

p5pRT commented 11 years ago

From @demerphq

On 20 February 2013 10​:49\, Brian Fraser \fraserbn@​gmail\.com wrote​:

On Mon\, Feb 18\, 2013 at 9​:55 PM\, Brian Fraser \fraserbn@​gmail\.com wrote​:

On Fri\, Jan 25\, 2013 at 1​:06 AM\, Ricardo Signes via RT \perlbug\-followup@​perl\.org wrote​:

* Karl Williamson \public@​khwilliamson\.com [2013-01-14T16​:00​:51]

I think we need to do something on this for 5.18. We can't\, IMO\, continue\, for example\, to allow surrogates to be identifier names.

I agree.

A release of 5.18.0 with this awful behavior would not be a catastrophe\, but it would be a shame\, especially since it looks like we can avoid it.

If you read below\, it looks to me that consensus was being approached. FC had a few modifications he wanted in Brian's proposal\, but then Brian vanished without replying\, and is just now starting to catch up on his emails\, being on summer break from school.

I do note that his patches showed that we have a bug currently in that we have the same code mostly\, but not entirely\, repeated in 2 places\, and so the parsing results differ based on no good reason.

Yes\, I supported Brian's proposal. I think FC's amendments are fine. Nick's points about 127-255 are also important.

Please let me know how I can help\, if it's anything more than saying\, "I agree\," which I have now done.

Someone kicking my university's arse for leeching my time would be welcome\, of course\, but failing that.. : )

That aside\, apologies for the overlong wait. I'm changing the branch right now\, so as a sort of recap\, here's how things should stand in a bit​:

Both​: No more single colons when inside ${}. Several other pretty obscure parsing bugs regarding :​: and ' should be mostly fixed. Now they basically work like (?​:(?​: :​:)* '? (?&ident))+ (?​: :​:)* That is\, any number of double colons\, with at most a single quote either separating packages\, or after the sigil / braces\, or after a series of colons\, but not at the end of the identifier.

Under nothing / no utf8\, or evalbytes​: We treat the source as ASCII + 128 controls\, not as Latin1 (which Nicholas pointed out is how we've done it\, historically) Variables of length 2 and more can only have characters that match /\w/aa Variables of length 1 should match /(?[ \p{POSIX_Cntrl} + \p{POSIX_Punct} + \w + [\x80-\xff] ])/aa Outside of length 1 idents\, comments\, regexen and strings\, you can only use the extra controls as quotes for qq and friends.

Under use utf8\, or evaling a UTF-8 flagged string​: Identifiers (both variables and barewords) match /\p{Perl_XIDS}+\p{XIDC}*/ Variables of length 1 match /(?[ \p{POSIX_Cntrl} + \p{POSIX_Punct} + \p{Perl_XIDS} + [0-9] ])

Another issue that Father C objected against was that ./perl -Ilib -le 'use utf8; print eval qq{qq\xc2\xb7 test \xc2\xb7};' Would no longer work\, since the qq would get parsed alongside the \N{MIDDLE DOT} as a bareword. I'm looking into this right now\, but looks like I may have spoken too soon\, since that's currently working. I do think that parsing it as a bareword is the way to go\, but since it's contentious it could be left until 5.20.

Only tangentially related to the above\, I've been doing some digging as to why we even allow these​:

$​::'foo foo​::​::bar foo​::​::​::​::bar foo​::'bar

Or why $​::1​:: is legal\, but $1​:: is not.

I can't find anything beyond "because the parser allows it." We have a handful of tests for things like this​:

package Foo​::; sub bar{1} package main; sub foo​::​::​::bar{1} Foo​::​::bar(); foo​::​::​::bar();

But besides the novelty\, is that of any worth?

I bring this up because it should be rather simple to remove the insanity\, so that we go from this​:

        \(?\<normal\_identifier>
            \(?&#8203;: :&#8203;: \)\* '?
                \(?&basic\_identifier\)
                \(?&#8203;: \(?= \(?&#8203;: :&#8203;: \)\+ '? | \(?&#8203;: :&#8203;: \)\* ' \)

(?&normal_identifier) )* (?​: :​: )* )

To this​:

        \(?\<normal\_identifier>
            \(?&#8203;: :&#8203;: | ' \)?
                \(?&basic\_identifier\)
                \(?&#8203;: \(?= :&#8203;: | ' \) \(?&normal\_identifier\) \)\*
            \(?&#8203;: :&#8203;: \)?
        \)

That being said\, maintainability-wise\, either way doesn't impact anything\, although removing these weird edge cases might make the eventual refactoring of gv_fetchpvn_flags less painful.

And its just plain old confusing. ++ on getting rid of it.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @Leont

On Wed\, Feb 20\, 2013 at 10​:49 AM\, Brian Fraser \fraserbn@&#8203;gmail\.com wrote​:

That being said\, maintainability-wise\, either way doesn't impact anything\, although removing these weird edge cases might make the eventual refactoring of gv_fetchpvn_flags less painful.

That alone would be reason enough for me.

Leon

p5pRT commented 11 years ago

From @nwc10

On Wed\, Feb 20\, 2013 at 06​:49​:13AM -0300\, Brian Fraser wrote​:

I can't find anything beyond "because the parser allows it." We have a handful of tests for things like this​:

package Foo​::; sub bar{1} package main; sub foo​::​::​::bar{1} Foo​::​::bar(); foo​::​::​::bar();

But besides the novelty\, is that of any worth?

I don't *think* so.

Also\, I'm suspicious that most of the tests are there more to ensure that behaviour doesn't change accidentally\, than to say "this is correct".

At least one test for something like this was added because there turned out to be an unexercised code path in the C\, and I'd assumed that it was unreachable\, when it turned out simply to be untested.

So don't assume that the tests are gospel\, or some sort of spec.

Nicholas Clark

p5pRT commented 11 years ago

From @Hugmeir

On Wed\, Feb 20\, 2013 at 7​:49 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Wed\, Feb 20\, 2013 at 06​:49​:13AM -0300\, Brian Fraser wrote​:

I can't find anything beyond "because the parser allows it." We have a handful of tests for things like this​:

package Foo​::; sub bar{1} package main; sub foo​::​::​::bar{1} Foo​::​::bar(); foo​::​::​::bar();

But besides the novelty\, is that of any worth?

I don't *think* so.

Also\, I'm suspicious that most of the tests are there more to ensure that behaviour doesn't change accidentally\, than to say "this is correct".

At least one test for something like this was added because there turned out to be an unexercised code path in the C\, and I'd assumed that it was unreachable\, when it turned out simply to be untested.

So don't assume that the tests are gospel\, or some sort of spec.

Nicholas Clark

Alright\, thanks! The latest commit in the branch introduces the simplifications for package delimiters\, except for 'package Foo​::;'\, which I couldn't figure out how to remove easily. I think the branch is ready for merging/likely more review -- although if the last commit is contentious\, the previous one should also be ready.

p5pRT commented 11 years ago

From @khwilliamson

Fixed by 32833930e32dc619abdaaab54e88de2a2765fb86 -- Karl Williamson

p5pRT commented 11 years ago

@khwilliamson - Status changed from 'open' to 'resolved'