Perl / PPCs

This repository is for Requests For Comments - proposals to change the Perl language.
61 stars 22 forks source link

Signature named params #54

Closed leonerd closed 1 week ago

leonerd commented 1 month ago

Another PPC doc to add to the growing pile ;)

As is traditional for me I didn't fill in the "Rationale" section as I still have no idea how that differs from a "Motivation". The two words seem rather synonymous to me.

A reminder on the PPC process: Accepting this PR means we like the document, but it does not automatically imply that we accept the idea embodied by it. Accepting the document first is useful as it gives it a number and allows us to discuss and iterate on one canonical version of it, ahead of accepting the underlying idea. It is perfectly fine to accept the document even if questions and issues still surround it.

JRaspass commented 1 month ago

You say named parameters must appear after positional parameters but are we still able to put a slurpy unnamed hash at the end to make this a drop-in safe replacement for existing code in the same way that adding a slurpy unnamed array makes existing subroutine support a drop-in replacement:

sub foo { my ( $bar, $baz ) = @_ ) { .. }
# ↓
sub foo( $bar, $baz, @ ) { ... }
sub foo { my %args = @_; ... }
# ↓
sub foo( :$bar, :$baz, % ) { ... }
leonerd commented 1 month ago

You say named parameters must appear after positional parameters but are we still able to put a slurpy unnamed hash at the end to make this a drop-in safe replacement for existing code in the same way that adding a slurpy unnamed array makes existing subroutine support a drop-in replacement:

It says exactly that:

If a subroutine uses named parameters then it may optionally use a slurpy hash argument as its final element. In this case, the hash will receive all other name-value pairs passed by the caller, apart from those consumed by named parameters.

oodler577 commented 1 month ago

This is my final comment and a simple summary of the insufficiency of this, which I suspect could be easy to solve.

Given,

sub make_colour ( :$red = 0, :$green = 0, :$blue = 0 ) { ... }

make_colour( red => 1.0, blue => 0.5 );
# The body of the function will be invoked with
#   $red   = 1.0
#   $green = 0
#   $blue  = 0.5

The calling pattern that this illustrates suggests one that is already common now for named parameters. However, what this gets turned into inside of the sum is not at all what is the common expected idiom. Given the sub interface it creates, it is reasonable to expect that there is some way to automatically achieve the equivalent of the following:

sub make_colour {
  my %colors = @_;
  ...
}

I don't know how to suggest you get both and I am not going to; however it would introduce a very confusing inconsistency between the invocation of the function and the caller's interface. This may be mutually exclusive of the desire to have a list of SCALAR variables already present.

I believe this MUST be rectified before seriously moving forward. Hopefully it's easy.

To be clear using the example currently in the PPC, this means:

sub make_colour ( :$red = 0, :$green = 0, :$blue = 0 ) { ... }

make_colour( red => 1.0, blue => 0.5 );
# The body of the function will be invoked with
#  $params{red}     = 1.0
#  $params{green} = 0
#  $params{blue}   = 0.5

Or like I said in another comment, this would be actually be the ideal (for me, anyway):

sub make_colour ( :$red = 0, :$green = 0, :$blue = 0 ) {}

make_colour( red => 1.0, blue => 0.5 );
# The body of the function will be invoked with 
# $_->red          a "reader" or getter that returns 1.0, with direct access to $_->{red}
# $_->green      a "reader" or getter that returns 0, with direct access to $_->{green}
# $_->blue        a "reader" or getter that returns 0.5, with direct access to $_->{blue}

Not sure what to say about the package string or $self if it's called that way; but that ambiguity seems present in the current spec also.

If you really want to have $red, $blue, and $green available then do it. But having now way to easily capture the kv pairs easily into a offers no improvement to the situation where one may do, my %params=@_ - where as providing that HASH ref would benefit that in at least a couple of ways according to my thinking ($_ is a HASH ref, and if getters are provided it makes thing nice and clean for non-class Perl).

I have no opinion on the syntax DSL defining the signature, there's only so much you can do there. Hopefully something I have said or suggested above will result in something positive.

zmughal commented 1 month ago

Raku equivalent for comparision:

rabbiveesh commented 1 month ago

Given the sub interface it creates, it is reasonable to expect that there is some way to automatically achieve the equivalent of the following:

sub make_colour {
  my %colors = @_;
  ...
}

I don't know how to suggest you get both

If you'd like that interface you could have it now, using sub make_color (%colors) {}. Are you sure you know perl?

dstroma commented 1 month ago

How would collisions with Corinna class field names be handled?

class Point {
  field $x;
  field $y;
  method add (:$x, :$y) {
    # Whose $x is $x?

Or will it cause an error?

Grinnz commented 1 month ago

How would collisions with Corinna class field names be handled?

class Point {
  field $x;
  field $y;
  method add (:$x, :$y) {
    # Whose $x is $x?

Or will it cause an error?

I would suggest the same thing would happen as if you do field $x; { my $x; } since this is effectively a lexical variable declaration.

oodler577 commented 1 month ago

After thinking more, I didn't handle any of this the right way. i am sorry for any problems this caused, I should have done my thinking and processing privately. Now having done so, I am going to try a different tact.

Please add to this a way to access a HASH reference that contains the respective named parameters as keys that point to the passed values. I assume this would leverage references so that $all_params->{red} was a reference to $red. Keep the automagical scalars, e.g., $red, etc.

This simple addition would make this feature extremely useful for Perl programmers who use named parameters often and already do the my %params = @_ unpacking, but in this case it is a value add because as a reference, $params (or whatever you name it, idc), would already be unpacked and already to go.

The automagical named scalars are useful for beginners and for short subroutines by everyone (I admit, this could be useful in some cases); but for more complicated subroutines and more advanced programmers, access to the params as a HASH ref (as many tend to do) will be missing and they will have to fall back to my %params = @_ - the way they would do it anyhow, leaving them with zero reason to use the signatures in the first place.

I can't say for certain, but I imagine the overhead here is negligible; the benefits are extended to a much wider and experienced audience. Please consider this as part of your proposal here for named parameters.

Thank you.

mauke commented 1 month ago

The automagical named scalars are useful for beginners and for short subroutines by everyone (I admit, this could be useful in some cases);

It's not just beginners and short subs. The lack of named parameters in core is the main reason I still use Function::Parameters, even in CPAN code. (NB: Function::Parameters has had named parameters using the exact syntax proposed above since 2012. This is hardly a new or controversial feature.)

but for more complicated subroutines and more advanced programmers, access to the params as a HASH ref (as many tend to do) will be missing and they will have to fall back to my %params = @_ - the way they would do it anyhow, leaving them with zero reason to use the signatures in the first place.

Some of the things I don't understand:


¹ I still hate that name. They're just parameter lists.

happy-barney commented 1 month ago

The automagical named scalars are useful for beginners and for short subroutines by everyone (I admit, this could be useful in some cases);

It's not just beginners and short subs.

Generally named arguments improves readability and, when well named, intention of author. It lowers learning costs of newbies in project as well as can be difference between maintainable software and legacy piece of mud.

oodler577 commented 1 month ago

Why do you think a reference to a hash is better than a hash?

I don't, a hash is fine.

That said, sub foo(%params) would give you arbitrary key/value pairs under the current proposal. What's wrong with just using that?

The information communicated with the explicitly defined params in the signature gives a very helpful level of self documentation about the subroutine using it. The value of the named parameters signature to me is clear, this is building on top of that.

If my %params = @ does what you want, then there is no reason for you to use "signatures".¹ That's fine. (Ditto for my $params = {@} and hash references.) How is this a problem?

Because it requires one to "step out" of the signature feature, inducing some level of cognitive dissonance that can form an obex in the developers thinking while creating and maintaining the subroutine.

Generally named arguments improves readability and, when well named, intention of author. It lowers learning costs of newbies in project as well as can be difference between maintainable software and legacy piece of mud.

I agree, I see their benefit. Providing key/pair access to the passed parameters would make this PPC even more helpful by providing advanced idiomatic access to the arguments passed into the subroutine. It also takes care of providing introspection inside of the subroutine about what parameters are available via the signature. As far as I can tell, this information would have to be held externally if one wanted to leverage that list.

Thank you.

happy-barney commented 1 month ago

I agree, I see their benefit. Providing key/pair access to the passed parameters would make this PPC even more helpful by providing advanced idiomatic access to the arguments passed into the subroutine. It also takes care of providing introspection inside of the subroutine about what parameters are available via the signature. As far as I can tell, this information would have to be held externally if one wanted to leverage that list.

well, %_ can be abused to do that (though there I'd prefer keys with sigil)

esabol commented 1 month ago

@mauke wrote:

The lack of named parameters in core is the main reason I still use Function::Parameters, even in CPAN code. (NB: Function::Parameters has had named parameters using the exact syntax proposed above since 2012. This is hardly a new or controversial feature.)

Thank you. This provided some very useful context for this discussion, I feel.

adamcrussell commented 1 month ago

As is traditional for me I didn't fill in the "Rationale" section as I still have no idea how that differs from a "Motivation". The two words seem rather synonymous to me.

Rationale would seem to be the more abstract purpose of the proposal and Motivation an example of an actual problem that would be solved or a situation that would be made easier. There are plenty of good ideas in the world that have nothing useful to motivate their implementation.

To that end, do you have a practical minimal example of the problem or situation that this would address? Perhaps, say, you encountered something in a recent application or project you are developing outside of your work on the interpreter?

zmughal commented 1 month ago

I wrote up some code that has several approaches to using named parameters in both the explicit scalar-per-parameter approach and slurpy-hash approach here https://github.com/zmughal-experiment/perl-parsing-panoply/pull/1. My goals were to:

I can see using both (xor) approaches on a case-by-case basis, but wanting to still have the same external API. Perhaps this can be done via the :collect idea, but instead the attribute can also specify a variable to collect into (e.g.,

sub f( :$red :collect(%color), :$blue :collect(%color), ... ) { ... }

or an attribute on the entire function

sub f :collect(@pos %named) ($one, $two, :$r, :$g, :$b) ) {
  $named{r}
}

)

The particular example of using an RGB color triple might not be the best way to compare as that points towards using an object as these are closely related values.


Another thing that I would like to think about when it comes to working with parameters is how easy is it to make sure that wrapper functions and functional programming are easy to write (e.g., an around modifier that modifies or splices out a single positional or named parameter and then sends the argument list on to the coderef that it is wrapping). I previously mentioned that here in relation to how checks in the signature might work.

oodler577 commented 1 month ago

I agree, I see their benefit. Providing key/pair access to the passed parameters would make this PPC even more helpful by providing advanced idiomatic access to the arguments passed into the subroutine. It also takes care of providing introspection inside of the subroutine about what parameters are available via the signature. As far as I can tell, this information would have to be held externally if one wanted to leverage that list.

well, %_ can be abused to do that (though there I'd prefer keys with sigil)

Thank you for the chance to clarify. I hope some kind of hash access will be considered.

leonerd commented 1 month ago

@iabyn

Looking at this proposal compared to my earlier 2019 signatures proposal (http://nntp.perl.org/group/perl.perl5.porters/256679):

My proposal allowed optional positional parameters as long as all named parameters were also optional:

    sub f($x, $y = 0, :$r = 0, :$g = 0, :$b = 0);
    f(1);

Yes - your idea, plus also there's a few other CPAN modules (Function::Parameters, Kavorka, etc..) all used the same idea, which is why I went with that.

I don't think I've explicitly put words in to this effect, but yes - if all the named ones are optional then there can be optional positional ones too. I can add some words for that.

My proposal insisted that all mandatory named params come before optional named params.

Mmm, that restriction might become necessary depending on the implementation, but the way I originally implemented it in XS::Parse::Sublike didn't require it, so I didn't write about that. Perhaps we can leave it flexible for now awaiting an actual implementation.

More generally, the processing order of default value expressions and fetching of arg values (e.g. FETCH calls on tied values) needs to be carefully considered. So for example in:

    sub f(:$a = def_a(), :$b = def_b(), :$c = def_c())
    f(c => $tied_c, a => $tied_a)

in which order are $tied_a->FETCH(), def_b(), $tied_c->FETCH() called? We either need to document the order, or explicitly document that the order is undefined and subject to change, because otherwise any later internal optimisations will break someone's code.

Ahyes a good idea .My original plan was to leave details like that relatively unspecified in the PPC doc and defer such considerations to the actual implementation and its documentation. Perhaps I should at least write words in the PPC to explain that bit.

My original proposal suggested that there should be a conceptual sorting and de-duplicating of the name value pairs in the args list into the same order as the declared params, then arg to param binding takes place in L-R order. How this is implemented internally is up to us, but as long as the visible effects of ordering is preserved, it doesn't matter.

Yes, more ordering things to consider.

Only allowing a slurpy hash: why not allow slurpy array?

That fell out of the way I implemented it in XS::Parse::Sublike. What XPS does is creates a lexical hash with an unparseable name to unpack the caller args into and process them into the individual vars. If the signature syntax wanted a slurpy hash at the end, well it became a simple matter to just use that name as the internal hash name anyway, then it's visible to the subroutine body for free. I couldn't think of a way to implement that with an array, without just assigning back out of the hash into an array and possibly breaking the originally-passed values.

If we allow a slurpy array, it leads to questions about whether it just has list-like nature of caller-passed values that don't follow the unique key/value structure of a hash. For example:

sub f($alpha, $beta, :$x, :$y, @rest) {
  say for @rest;
}

f(12, 34, x => 5, y => 6, z => 7, z => 8, z => 9);

In this case, would @rest see a 6-value list ('z', 7, 'z', 8, 'z', 9)? How does that interact with other orders of values being passed by the caller?

Duplicate named args shouldn't warn by default, to allow caller to do foo(%defaults, %args) for example. I also suggested that for duplicate args, only the second would have its value accessed (i.e. get magic called, e.g. FETCH).

Yup, sounds sensible.

leonerd commented 1 month ago

Following up on some comments here, I've added some more wording. I've specifically mentioned a few other CPAN modules that provide the same syntax idea, and expanded on the suggestion of an attribute to provide alternative names for parameters.

iabyn commented 1 month ago

On Fri, Aug 23, 2024 at 03:11:54AM -0700, Paul Evans wrote:

Mmm, that restriction might become necessary depending on the implementation, but the way I originally implemented it in XS::Parse::Sublike didn't require it, so I didn't write about that. Perhaps we can leave it flexible for now awaiting an actual implementation.

In general I would prefer we restrict things slightly now - at least for things which don't inconvenience the coder - in order to allow more flexibility for future optimisations. This comes from long painful experience of doing optimisation work over the years. For example, multiconcat slightly changes the order in which FETCH() methods etc are called on concat arguments (IIRC), and this broke some CPAN modules.

Only allowing a slurpy hash: why not allow slurpy array?

That fell out of the way I implemented it in XS::Parse::Sublike. What XPS does is creates a lexical hash with an unparseable name to unpack the caller args into and process them into the individual vars. If the signature syntax wanted a slurpy hash at the end, well it became a simple matter to just use that name as the internal hash name anyway, then it's visible to the subroutine body for free. I couldn't think of a way to implement that with an array, without just assigning back out of the hash into an array and possibly breaking the originally-passed values.

If we allow a slurpy array, it leads to questions about whether it just has list-like nature of caller-passed values that don't follow the unique key/value structure of a hash. For example:

sub f($alpha, $beta, :$x, :$y, @rest) {
  say for @rest;
}

f(12, 34, x => 5, y => 6, z => 7, z => 8, z => 9);

In this case, would @.***see a 6-value list('z', 7, 'z', 8, 'z', 9)`? How does that interact with other orders of values being passed by the caller?

I think I addressed that in my original proposals.

NB: my vague intention for implementing named subs was an algorithm along the lines of:

At compile time, construct some sort of lightweight perfect hash (not a perl hash, at least not eventually) that maps every legal name key to an index number. This hash would be attached to the OP_ARGCHECK.

At runtime, pp_argcheck() would extend the stack by 2 x (number of named parameters), and fill with NULLs. Then the args (starting at the position after all fixed parameters and going L-R) would be grabbed, with each (name, value) pair inserted at the index looked up based on the name. This is effectively a pigeon-hole stable sort with de-duplication.

Then subsequent arg processing ops would run along the extended stack L-R, assigning the values to the pad vars in order, and for any slots still NULL, call the associated default value expression. The mechanism can treat them more-or-less like positional parameters now.

This is also based on the assumption that the original args will eventually be on the stack rather than in @_.

That's very vague, and there's a lot more to it that that, but one thing I've always been very conscious of is that perl subroutine calls are very slow - they have a lot of overhead. We should be designing any signature features in such a way that they're no slower than currently, and in a way which provide the long-term potential for big speedups. I would consider populating a perl hash for each subroutine call a heavy overhead.

ilmari commented 1 month ago

@iabyn

Looking at this proposal compared to my earlier 2019 signatures proposal (http://nntp.perl.org/group/perl.perl5.porters/256679): My proposal allowed optional positional parameters as long as all named parameters were also optional:

    sub f($x, $y = 0, :$r = 0, :$g = 0, :$b = 0);
    f(1);

How would this work if there are two optional positional parameters, and you want to pass a value to the first one that happens to match the name of a named parameter?

sub f($x = '', $y = 0, :$z = 0);
f(z => 42);

or more subtly:

my ($foo, $bar) = qw(z 42);
f($foo, $bar);
leonerd commented 1 month ago

@ilmari I would imagine if you wanted to pass any named parameters, then you'd have to fill in any of the optional ones first. Which is mostly why I didn't really write about it or consider it in my thoughts. Most of the point of having named parameters is to allow you to have optional params without specifying a "priority order" to which ones you have to fill in before you can pass the others. We might specify that "hey, you can do this, but it probably won't do anything you find useful, so maybe don't do that?"

dstroma commented 1 month ago

How would collisions with Corinna class field names be handled?

class Point {
  field $x;
  field $y;
  method add (:$x, :$y) {
    # Whose $x is $x?

Or will it cause an error?

I would suggest the same thing would happen as if you do field $x; { my $x; } since this is effectively a lexical variable declaration.

Indeed, you can already shadow a field variable with a positional parameter, so doing it with a named parameter isn't really anything new. Might be worth it to consider a warning if this happens but that is beyond the scope of this PPC.

I like this PPC as is. I'd rather not overcomplicate it by allowing such things as multiple names for the same parameter. Those who want that functionality can just use the traditional way of doing that, or a CPAN module. Or it can be a future PPC.

leonerd commented 1 month ago

@dstroma

I like this PPC as is. I'd rather not overcomplicate it by allowing such things as multiple names for the same parameter. Those who want that functionality can just use the traditional way of doing that, or a CPAN module. Or it can be a future PPC.

Indeed - this is the point of the "Future Scope" and "Open Issues" parts of the document. They're for explaining "here's some things we thought about but we're not solving at the moment".

iabyn commented 1 month ago

On Fri, Aug 23, 2024 at 05:44:56AM -0700, Dagfinn Ilmari Mannsåker wrote:

How would this work if there are two optional positional parameter, and you want to pass a value to the first one that happens to equal a named parameter?


sub f($x = '', $y = 0, :$z = 0);
f(z => 42);

Positional parameters consume any args first. Only if there are any args left after all positional params have been satisfied, should named param processing start. So your example above behaves equivalently to:

  my $x = 'z';
  my $y = '42';
  my $z = 0;   # default value

-- "But Sidley Park is already a picture, and a most amiable picture too. The slopes are green and gentle. The trees are companionably grouped at intervals that show them to advantage. The rill is a serpentine ribbon unwound from the lake peaceably contained by meadows on which the right amount of sheep are tastefully arranged." -- Lady Croom, "Arcadia"

leonerd commented 1 month ago

I've added further wording on some more comments.

One thing I've not yet addressed is the relative order between checking for the presence of mandatory named parameters, vs evaluating any defaulting expressions. With positional parameters it's easy to make a check of scalar @_ between numerical limits, to see whether this implies the minimum number of required positional parameters. This check is done early (as part of the OP_ARGCHECK opcode), meaning we reject invalid calls that are missing arguments even before any defaulting expressions are evaluated.

But with named ones we'd have to scan over the entire incoming arguments list to look for what names are present. In the current implementation I have made in XS::Parse::Sublike, the check for mandatory named parameters is effectively part of evaluating the defaulting expression, because I just store a defaulting expression that throws an exception complaining of a missing argument. In effect that means the check happens as a side-effect of evaluating the default value, but means that each parameter is only checked for presence after any earlier parameters have already had their defaulting expressions checked.

In XS::Parse::Sublike I specifically document that this order is not guaranteed, to give some flexibility here, but it's likely something that a core implementation should think more about, and maybe find a better solution to.

leonerd commented 2 weeks ago

Based on this comment from the PSC, I've given the document the official 0024 numbering. https://www.nntp.perl.org/group/perl.perl5.porters/2024/09/msg268790.html

I guess this is ready to merge now?