Perl / PPCs

This repository is for Requests For Comments - proposals to change the Perl language.
61 stars 22 forks source link

RFC 0014: ${^ENGLISH_NAME} aliases for punctuation variables #22

Closed haarg closed 1 year ago

leonerd commented 2 years ago

Since for a while now, $* has been discouraged/deprecated/removed, it is free for use once more. I wonder if a different spelling of these would be better, using that form.

$*EVAL_ERROR # $@
$*PID        # $$
etc...
jberger commented 2 years ago

Since for a while now, $* has been discouraged/deprecated/removed, it is free for use once more. I wonder if a different spelling of these would be better, using that form.

$*EVAL_ERROR # $@
$*PID        # $$
etc...

I see why that is worth a discussion, however, as we already have the ${^...} syntax in use, I don't see any reason to add yet another. Yes I acknowledge that the lack of compile-time strictness is unfortunate, however, adding more syntax feels like solving a rare problem with more cognitive overhead and I hope that we're trying to move in the opposite direction.

leonerd commented 2 years ago

I don't see any reason to add yet another. Yes I acknowledge that the lack of compile-time strictness is unfortunate, however, adding more syntax feels like solving a rare problem with more cognitive overhead and I hope that we're trying to move in the opposite direction.

I'm presenting it purely as another possibility to discuss. We can decide on the relative merits of the

It may be we do decide on one of the first two options - I just want to ensure that gets debated out loud so afterwards, we know what the alternatives were that we didn't choose and why we didn't choose them.

jberger commented 2 years ago

When I said

I see why that is worth a discussion

I was essentially saying

I just want to ensure that gets debated out loud so afterwards, we know what the alternatives were that we didn't choose and why we didn't choose them.

Absolutely, let's discuss. My previous reply was my comment thereof

Grinnz commented 2 years ago

Another possibility is warning on use of a ${^...} variable without an assigned meaning. But that wouldn't help with accidentally using it on a Perl too old to have such a warning (nothing would, really).

jberger commented 2 years ago

Another possibility is warning on use of a ${^...} variable without an assigned meaning. But that wouldn't help with accidentally using it on a Perl too old to have such a warning (nothing would, really).

I think this is a great idea. Since they don't come from user-code, on access you know when a lookup of a key is for an unknown key. Indeed it can even be a single for future-proofing. "This version of perl does not understand ${^NAME_FROM_NEWER_PERL}, which does not exist on this version of perl which is $^X"

Leont commented 2 years ago

Another possibility is warning on use of a ${^...} variable without an assigned meaning. But that wouldn't help with accidentally using it on a Perl too old to have such a warning (nothing would, really).

In many cases, 'once' would would for once be useful!

haarg commented 2 years ago

'once' warnings don't apply to superglobal variables like this. Even if they did, once warnings only appear for the main compilation unit, which means they are normally useless in modules.

ivrntsv commented 2 years ago

Perl has special variables like $^X, $^O, $^V. Why not change parsing rules to consume longest simple identifier after $^ and make braces optional? If it possible.

ivrntsv commented 2 years ago

Backward compatibility kills all the fun but maybe for good this time.

Perl has $^, $^_, $^[ONE_CHAR], ${^FOO}, ${^_FOO} (with other sigils as well) variables with common prefix. I don't know how internals work so I can only imagine (likely wrongly). So I imagine that $^FOO (note lack of braces) construction could be make to work somehow cause now it is a syntax error. But only outside of a string. Inside a string this variable must be always demarcated for old code to work. "$^FOO" must be parsed as "${^F}OO". But this goes out of the regural order of doing iterpolation: to demarcate only when nessesary.

I am with leonerd on this. To my taste $*FOO is clear. ${^FOO} is ugly and cumbersome. Indexing is better ($*FOO[2] everywhere vs. ${^FOO}[2] outside of string, ${^FOO[2]} inside). Variable demarcation is optional. The set of $*FOO variables is limited so a strictness collar could be put on it. $* prefix gives a clue that variable is super_glob_al (ofcourse variable doesn't contain a glob it's a clue).

So instead of using ${^FOO} construction or trying to fix it to some degree it's better to create a new one with clear, well defined semantics without old baggage. Some variables will be repeated in old and new form. I deem that new form will be used more often and old form will allow old code continue to serve its purpose.

haarg commented 2 years ago

A problem with $*FOO is how to use it in delimited form in string interpolation. ${*FOO} can't be used because it already means to dereference a glob. Possibly $*{FOO} could be used, but that looks very wrong to me.

ivrntsv commented 2 years ago

Alas then. ${^FOO} is the way to go.

xenu commented 2 years ago

I don't see why $^FOO not working inside double-quoted strings is such a big deal. We could make $^FOO work outside of string literals and ${^FOO} will continue to work inside them. It's a little bit inconsistent, but still better than the hacks needed to interpolate function and method calls.

Also, template literals will improve the situation a bit.

haarg commented 2 years ago

We could maybe add $^FOO parsing. It doesn't really solve any issues though, aside from maybe being a bit prettier. It would end up just being an alternate syntax for ${^FOO}. If we wanted to add that, I don't see why it would block this.

xenu commented 2 years ago

If we wanted to add that, I don't see why it would block this.

Agreed, it's a separate issue.

aaronpriven commented 2 years ago

$^FOO parsing would solve two problems: 1) ${^FOO} is ugly and a pain to type, and 2) more importantly, strictness could apply, which it can't to the existing ${^FOO} syntax.

Seems to me that formats are little-used enough that requiring format users to type $^FORMAT_TOP_NAME instead of $^ would be an acceptable loss, if that would make $^FOO possible. It's not clear to me though whether it is $^ or control-character variables like $^T that are the real obstacle. If re-using some other format variable like $:, $-, or $% would work where $^ would not, I'm for that.

haarg commented 2 years ago

The problem with $^FOO is that it parses as "${^F}OO" when interpolating. Maybe that could change, but it would still need a delimited form for quoting, which would be the existing ${^FOO} construct. Trying to make strict only apply to one of these would IMO be a terrible idea.

We already have some ${^FOO} variables. And we already use the ${^FOO} form for any new superglobals that are added. The fact that these are exempt from strict is already a problem. I think it's worth thinking about how we could apply strict to those, even if this RFC was not accepted.

aaronpriven commented 2 years ago

We have two issues:

  1. ${^FOO} is ugly and hard to type
  2. making ${^FOO} subject to use strict vars would break backward compatibility

My suggestion is that the RFC acknowledge that these are issues, but that they are the same issues as the many other existing ${^FOO} variables already has, and that proposal should be adopted in any event and that these issues be dealt with another day.

rjbs commented 2 years ago
  1. how much of a problem would it really be to make ${^FOO} subject to stricture? (why wasn't it?)
  2. should we use $*FOO instead, per offhand comment by Paul? :)
leonerd commented 2 years ago
  • how much of a problem would it really be to make ${^FOO} subject to stricture? (why wasn't it?)

Or we could add a new strictness flag - use strict 'vars-even-those-ones'. Because adding new strictness flags has always been easy ;)

  • should we use $*FOO instead, per offhand comment by Paul? :)

+1

demerphq commented 2 years ago

I dont really see how $*FOO would interpolate or fit into our interpolation rules. ${*FOO} is already valid syntax and scalar dereferences the *FOO glob, and ${ NAME } is supposed to be an alternate way to spell $NAME so there is a conflict there adding $*NAME. I actually find it a little weird that $*FOO throws an error, as ${*FOO} rightly doesn't, i'd expect them both to dereference the *FOO glob. Perhaps someone was a little overzealous when they made $* itself illegal, I checked and in 5.28 it $*FOO threw an exception: Bareword found where operator expected so not a useful parse, but also /not/ an illegal var name. So the change in 5.30 was technically a regression.

Consider this code:

perl -le'our $FOO= "foo"; my $baz="BAZ"; print "${*FOO} ${baz}"'
foo BAZ
perl -le'our $FOO= "foo"; my $baz="BAZ"; print "${*FOO} ${baz}"; print ${*FOO}'
foo BAZ
foo
$ perl -le'our $FOO= "foo"; my $baz="BAZ"; print "${*FOO} ${baz}";
print ${*FOO}; print $*FOO;'
$* is no longer supported as of Perl 5.30 at -e line 1.

Another reason NOT to do this via $*FOO is that assuming we can fit it into the grammar in the first place, it would mean anything using these new vars is pretty much hard bound to new perls. No back compat support available other than source code filters. This applies to ideas about extending ${^FOO} to be $^FOO, even if we could do it, it would not be possible to offer back compat support.

If this is done via ${^FOO} on the other hand you can write a back compat module that makes these "english names" available in older perls, while being a noop in perls where they are presetup. If we need a way to detect mispelled cases then maybe in newer perls we can figure that out.

I really think introducing a new way to name vars is a bad idea and should be resisted, and using $* as the basis is an even worse idea, it will just pile up the special cases and rules we have. *NAME is already special due to globs, adding another special case with $*NAME really seems like a can of worms that will just get deeper and deeper the closer you look at it.

leonerd commented 2 years ago

I dont really see how $*FOO would interpolate or fit into our interpolation rules. ${*FOO} is already valid syntax and scalar dereferences the *FOO glob, and ${ NAME } is supposed to be an alternate way to spell $NAME so there is a conflict there adding $*NAME ...

Ughyes, I'd forgotten all that mess about globrefs. You're right - the syntax would be a total mess if we did that.

I guess the only way out is to look into strict'ifying the existing ${^YOUR-NAME-HERE} syntax then.

demerphq commented 2 years ago

Before you go strictifying the namespace, maybe consider if it would make sense to strictify part of it. I suspect folks will be using it on CPAN and in modules. I know I have in the past.

Leont commented 2 years ago

how much of a problem would it really be to make ${^FOO} subject to stricture? (why wasn't it?)

What would that look like? Needing an our ${^FOO} before using it?

leonerd commented 2 years ago

Surely, it wouldn't look any different. It would just pick up typoes of unrecognised things.

Hypothetically:

perl -ce 'use strict; my $x = ${^STDERR}'
-e syntax OK

perl -ce 'use strict; my $x = ${^STDEER}'
Global symbol "${^STDEER} requires explicit package name ...
-e had complication errors

Currently, I observe:

$ perl -ce 'use strict; my $x = ${^STDERRaewrqe}'
-e syntax OK
Leont commented 2 years ago

Surely, it wouldn't look any different. It would just pick up typoes of unrecognised things.

But how would one declare such a variable?

Grinnz commented 2 years ago

I don't think we are talking about (lexical) declaration, as the existence of these variables is defined by the Perl interpreter. Just making a compilation error if a variable of this format is used that does not have a defined purpose in the interpreter.

haarg commented 2 years ago

I've updated the RFC to add additional comments about $^ENGLISH_NAME and $*ENGLISH_NAME forms.

I've also added a link to @Leont's prototype implementation English::Name.

rjbs commented 2 years ago

It sounds like part of what we'd like is to say that ${^FOO} is a stricture violation if ${^FOO} is not a known global interpreter variable. Some code may be expecting it can use these names freely. Do we have any idea the actual scope of this use?

demerphq commented 2 years ago

On Fri, 16 Sept 2022, 16:37 Ricardo Signes, @.***> wrote:

It sounds like part of what we'd like is to say that ${^FOO} is a stricture violation if ${^FOO} is not a known global interpreter variable. Some code may be expecting it can use these names freely. Do we have any idea the actual scope of this use?

I have definitely made use of them. I suspect cpan modules do as well. Also there is no registry of valid names in the core, as a core dev i have liberally added such vars, and could not even tell you which ones the code uses.

I'm basically against this unless it is "namespaced" to some prefix of the possible names. Eg, if the "english" names all started with VAR or PL or whatever.

Yves

Grinnz commented 2 years ago

Strictures on this form of variable would need to be an opt in feature, but I believe it is one worth adding. But note that perlvar is very specific on this topic:

These variables are reserved for future special uses by Perl, except for the ones that begin with ^ (caret-underscore). No name that begins with ^ will acquire a special meaning in any future version of Perl; such names may therefore be used safely in programs.

demerphq commented 2 years ago

On Fri, 16 Sept 2022, 17:24 Dan Book, @.***> wrote:

It would need to be an opt in feature, but I believe it is one worth adding. But note that perlvar is very specific on this topic:

These variables are reserved for future special uses by Perl, except for the ones that begin with ^ (caret-underscore). No name that begins with ^ will acquire a special meaning in any future version of Perl; such names may therefore be used safely in programs.

I don't have the tools to verify handy and wont for a week, when was that added? When I added such vars to perl I do not recollect it was in the docs. It's obviously wrong. We have added multiple such vars over the years.

Yves

Grinnz commented 2 years ago

The text is present since that form of variable was supported in Perl 5.6. I don't know what you consider "obviously wrong" - that you were able to use reserved variables doesn't change that they're reserved.

demerphq commented 2 years ago

On Fri, 16 Sept 2022, 22:57 Dan Book, @.***> wrote:

The text is present since that form of variable was supported in Perl 5.6.

What commit was it added in?

If nobody noticed this rule as we added var after var to perl from 5.10 on, I'd say the boat has sailed and we should remove it regardless of what it says.

Yves

Grinnz commented 2 years ago

I don't understand what you mean. It's reserved for Perl's use; Perl is allowed to add new ones.

hvds commented 2 years ago

I don't understand what you mean. It's reserved for Perl's use; Perl is allowed to add new ones.

Not sure, but I suspect that @demerphq is reading this part as if the underscore was not present: "No name that begins with ^_ will acquire a special meaning in any future version of Perl; such names may therefore be used safely in programs."

haarg commented 2 years ago

To check on the viability of adding some kind of strictness to variables like this, I've done a review of any current users of this syntax on CPAN. There are a number of uses in perl's tests that should probably be changed.

As far as CPAN goes, this is a pretty short list and mostly easily fixed. This leads me to believe that we could deprecate any use of these variables that aren't part of core, and then later apply strict to their use.

But given that we already have a number of variables with this form, my current opinion is that this could move ahead without adding any more strictness. Potentially applying strict could be considered separately.

aaronpriven commented 2 years ago

I totally forgot I put English::Control on cpan. I never heard about anybody using it. I'm happy to donate it to the cause, or take it down, or whatever.

demerphq commented 2 years ago

On Fri, 16 Sept 2022, 23:57 Hugo van der Sanden, @.***> wrote:

I don't understand what you mean. It's reserved for Perl's use; Perl is allowed to add new ones.

Not sure, but I suspect that @demerphq https://github.com/demerphq is reading this part as if the underscore was not present: "No name that begins with ^_ will acquire a special meaning in any future version of Perl; such names may therefore be used safely in programs."

Oh. Yep. My bad.

Ok, we can enforce that for sure. That puts mind to rest on this.

Yves

rjbs commented 2 years ago

This sounds like the general consensus is that we can make "use strict 'vars'" affect this. So if you say ${^FOO} and that's not one of Perl's global variables, it's fatal. We'll carve out an exception for FOO where FOO starts with an underscore.

Is that right?

haarg commented 2 years ago

That seems reasonable to me. Would we want a deprecation period?

tonycoz commented 2 years ago

Would a previously defined ${^FOO} that was has been removed fail a strict check?

I removed ${^WIN32_SLOPPY_STAT} when I re-worked stat() on Win32.

rjbs commented 2 years ago

Normally if we removed a variable like that we'd make it do nothing for X versions, except warn deprecation, and then we'd make it fatal. I think we should do the same here. We can either add the deprecation now, or wait and see, but adding a deprecation now seems like it avoids any question of breaking things apart from warning counters.

demerphq commented 2 years ago

On Fri, 9 Sept 2022 at 17:49, Paul Evans @.***> wrote:

how much of a problem would it really be to make ${^FOO} subject to stricture? (why wasn't it?)

Or we could add a new strictness flag - use strict 'vars-even-those-ones'. Because adding new strictness flags has always been easy ;)

should we use $*FOO instead, per offhand comment by Paul? :)

I dont really see how $FOO would interpolate or fit into our interpolation rules. ${FOO} is already valid syntax and scalar dereferences the FOO glob, and ${ NAME } is supposed to be an alternate way to spell $NAME. I actually find it a little weird that $FOO throws an error, as ${FOO} rightly doesn't, i'd expect them both to dereference the FOO glob. Perhaps someone was a little overzealous when they made $ itself illegal, i checked and in 5.28 it $FOO threw an exception: 'Bareword found where operator expected' so not useful, but not illegal either.

perl -le'our $FOO= "foo"; my $baz="BAZ"; print "${*FOO} ${baz}"' foo BAZ

perl -le'our $FOO= "foo"; my $baz="BAZ"; print "${FOO} ${baz}"; print ${FOO}' foo BAZ foo

$ perl -le'our $FOO= "foo"; my $baz="BAZ"; print "${FOO} ${baz}"; print ${FOO}; print $FOO;' $ is no longer supported as of Perl 5.30 at -e line 1.

Another reason NOT to do this via $*FOO is that it means anything using these new vars is pretty much hard bound to new perls.

If this is done via ${^FOO} on the other hand you can write a back compat module that makes these "english names" available in older perls, while being a noop in perls where they are presetup.

I really think introducing a new way to name vars is a bad idea and should be resisted, and using $* as the basis is an even worse idea, it will just pile up the special cases and rules we have.

cheers, yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

leonerd commented 1 year ago

It feels like there's two separate things being looked at here:

1) Adding English name aliases for punctuation variables 2) Making unrecognised ${^YOUR_NAME_HERE} variables fail a strictness check

These two things are separate.. it'd be a good idea to do them both, but neither really needs to depend on the other. It seems like we want to move ahead with English Names Aliases (i.e. this PR) anyway, and in a separate change consider how to add warnings and then strictness failures to the remaining unrecognised names.

Does this sound like a good way forward?

haarg commented 1 year ago

I agree with that path forward.

demerphq commented 1 year ago

me too.

On Fri, 14 Oct 2022 at 17:36, Graham Knop @.***> wrote:

I agree with that path forward.

— Reply to this email directly, view it on GitHub https://github.com/Perl/RFCs/pull/22#issuecomment-1279164126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZ5R4YDSOBGZ4DA3IYZRLWDF4XLANCNFSM53HG4RTQ . You are receiving this because you were mentioned.Message ID: @.***>

-- perl -Mre=debug -e "/just|another|perl|hacker/"

rjbs commented 1 year ago

I have created https://github.com/Perl/perl5/issues/20422 to track the stricture question. I will merge this so we can move to implementing.

demerphq commented 1 year ago

Hi. I am looking into this, but I have a question. Do we expect ${^ARG} and $_ (perhaps not the best examples) to be aliases of each other, or independent variables with the same magic? Would they share a single glob or have independent globs that mapped to the same magic behavior or initial settings? For instance would we expect that \${^SYSTEM_FD_MAX} == \$^F or would we just expect that have the same values and same magic? A side question to this, what would we expect at a stash level, as far as I can tell there are three possible reasonable expectations. If someone does defined(${^SYSTEM_FD_MAX}) would we consider it a bug if it created the $^F glob entry in the main:: stash? Would we expect it only to create the ${^SYSTEM_FD_MAX}, or would we expect it to only create the $^F entry?

The use english module does a one time glob copy.We could do something similar at startup in universal.c or something like that, but it would force all the vars into existence, where currently they are all created on demand. We could make the logic that handles special behavior do the same thing for both names. But the way the code is structured it is the caller of the relevant function (S_gv_magicalize in gv.c) who does the stash operation on the gv. So if we want this to do some kind of sneaky glob aliasing behind the scenes we will likely have to change the callers as well. So the answer to the above questions really will impact how much code has to change to implement this feature.