Open leonerd opened 2 years ago
Latest thoughts: While it doesn't help the field-initialiser block, an ADJUST
block could still work with something that happens superficially to look like a signature. The same parser hackery that applies to allowing each ADJUST block to just resume parsing of a single containing sub can be employed to adjust the pad names before resuming. It would permit:
ADJUST (%params) { say join "|", keys %params }
ADJUST (%args) { say "Foo is $args{foo}" }
ADJUST { say "params aren't visible here at all" }
which would allow ADJUST blocks to request access to the params, while neither eating up a reserved name nor using the %_
superglobal. Though it still doesn't solve the wider metadata/declarative question.
I don't like the idea of "Special %params
lexical". %params
is not obviously magic, and someone could easily want to name a field or other variable with that name. $self
is only acceptable as a reserved variable because it is so well established in perl that we can't really use anything else for the instance variable. But no such consensus exists for constructor params.
%_
is better, in that it is obviously "special" and maps well to the old @_
. But we're trying to move away from @_
.
Of the options presented so far, something declared by the user seems the only really acceptable option. As stated, it's not obvious how to combine that with initializer blocks, but I'm not sure that combining them is actually a necessary feature.
Thinking further about it, I'm not sure I like the exact syntax of this latter approach. It's nice to imagine giving an ADJUST
block a way to request the name of a lexical hash created in scope that aliases the parameters hash, so it can delete
on it, but I don't think the syntax given here is quite the right thing:
ADJUST (%params) { say join "|", keys %params }
Mostly because this really does something very different to the similar-looking syntax of
method f (%params) { ... }
What we're doing in the ADJUST block is really something a bit more like a refalias. Signatures don't support that yet, but if they did I imagine it'd look like method f (\%params) { ... }
and so maybe that suggests similar for ADJUST
. But that's what we'd always write, so it seems a bit noisy.
Another potential downside to this current notation is that it doesn't leave us any room for adding more things in there. Again by analogy to slurpy parameters, it feels like %params
ought to be last, with nothing else after it. I don't know what else we might want to pass in, but currently there isn't space for it. So boo. :(
I almost wonder if the ADJUST
block wants an attribute to give it the name of a new lexical to create, instead:
ADJUST :please_provide_me_the_parameters_in_a_hash_called(%params) { ... }
I'll take opinions on a nicer name ;)
And now some more examples on the "declared metadata" front...
Looking around at various actual ADJUSTPARAMS
blocks I have in real code right now, it seems most of them operate on a theme of taking some fixed set of named constructor arguments, and using them to set up the values of some fields by using code, rather than simply storing the values as passed in.
For example, a few operate on a theme of taking the name of a file to be opened, open
ing it in the ADJUST block, then storing the filehandle in a field of the instance:
Somewhat paraphrased:
field $fh :param { undef };
ADJUSTPARAMS ( $params ) {
unless( $fh ) {
my $path = delete $params->{path};
# code here to open $path and store it in $fh
}
}
It feels like this is a fairly common sort of pattern - parameters that are passed in, not to be stored directly, but instead to be used as part of a larger block of code that is evaluated at construction time which eventually yields the initial value for a field to be stored.
It's annoying that these parameters aren't declared any how, and only work as a side-effect of running the actual code. There's no predelcared metadata, so things like name-clashes aren't visible like they are for :param
:
$ perl -MObject::Pad -E 'role R1 { field $x :param } role R2 { field $x :param } class C1 :does(R1) :does(R2) {}'
Named parameter 'x' clashes with the one provided by role R2 at -e line 1.
While it isn't yet a thing, it is tempting to imagine that perhaps some way to attach documentation snippets could be found for :param
-annotated fields. If such a thing existed, then perhaps a sufficiently-advanced IDE would be able to display such information about named parameters being passed into constructors when they were encountered. These additional parameters being eaten by ADJUST blocks wouldn't be visible in that way either. Notice in the above example code, the string "path"
only appears once, in the delete $params->{path}
expression. It doesn't appear anywhere else - specifically, not in any compiletime-declared metadata about the class.
It's tempting to think that perhaps what is wanted is a new class
-level keyword for declaring that named parameters exist, so that they can have such information attached.
param path :apidoc(The path to the filesystem device to open, if 'fh' is not provided);
However, that's just dangling out in space somewhere separate, there's nothing to attach it to the specific value within the ADJUST
block. It'd be too easy to become separated from it.
It feels like the core of why this is hard to design comes from the fact that "named params" want to be two things at once:
ADJUST
blockThere doesn't at the moment appear to be a syntax idea that somehow addresses both concerns.
One possible answer to this: revisit the notion of "lexical fields" (at the very least still outside of methods) and through this, not needing to provide any parameters to ADJUST:
Using the file example above:
field $fh :param { undef };
{
field $path :param { undef };
ADJUST {
defined($fh) != defined($path) or die;
$path and (open $fh, "<", $path or die "$path: $!\n");
# In theory, when this ADJUST ends, $path is no longer visible to anything.
# Thus it could be released from the object.
}
}
I think we might revisit this by not looking at ADJUST
as a phaser. The original intent was to look at this similar to Moo/se, but use different names for the methods to ensure they weren't viewed as the same thing as Moose methods. In particular, Moo/se has:
# called before new
around "BUILDARGS" => sub ( $orig, $class, @args ) { ... };
sub new ($class, $args) { ... } # implicit. We don't write this directly
# called after new
sub BUILD ($self, @args) { ... }
For Corinna, I envisioned something vaguely like this (with naming suggested by @thoughtstream, with NEW
calling each of these in turn, but respecting inheritance by calling parent versions, too):
# before NEW
method CONSTRUCT :common (@args) {
# $class is available, but not $self
# must return even-sized list of key/value pairs that
# are passed to new()
}
method NEW (%args) { ... } # called implicitly using args returned by CONSTRUCT
# after NEW
method ADJUST (%args) {
# both $class and $self available. Do what you will before
# new object is returned
}
This is somewhat described here in the Wiki, but I see now that it didn't make it into the official RFC. When I saw you were using ADJUST
differently, I said nothing because I was tired of how divisive so many arguments about Corinna were. That's my fault. I should have said something. Burn out.
CONSTRUCT
is used to to take whatever is passed to new
and return K/V pairs. After all CONSTRUCT
methods are called, the final result goes to NEW
(internal method). It's awkward because it looks like a phaser (and those don't take args) and it kind of looks like an instance method, but it's not quite one.
ADJUST
is described here and is called after NEW
.
However, the Moo/se paradigm is only awkward (in my opinion) because Moo/se is limited by Perl's current syntax. The overall idea seems solid.
Your suggestion there doesn't seem to any clearer answer how to handle these different params. Perhaps a concrete example is needed.
Lets imagine some UI widget that needs a title and a colour specified in red/green/blue components. The title is just stored directly as text, but the red/green/blue are actually forwarded to the constructor of a "Colour" object, which is then stored. Right now in Object::Pad
or feature-class
I can write the following and it technically works:
class Widget {
field $title :param;
field $colour;
ADJUST ($params) {
$colour = Colour->new( delete %{$params}{qw( red green blue )} );
}
}
my $widget = Widget->new( title => "The text", red => 100, green => 30, blue => 0 );
This has some nice properties:
delete
from it the ones they consumeE.g. consider what happens to someone who does Widget->new( cyan => 123 )
.
It has some downsides to it though. Most notably, the words red
, green
and blue
only appear in the behavioural code; in the body of the ADJUST block in that hash-slice delete expression. They don't otherwise appear anywhere that is inspectable as declarative metadata. For example, we can imagine what might eventually happen when IDEs and code editors start to understand this syntax.
class Widget {
field $title :param :is(Text) :apidoc("The title text to display at the top");
....
my $widget = Widget->new( title => ...
If the user hovers the mouse over title
here perhaps they'll be reminded the type and the apidoc help string for this parameter.
Where in the class
declaration can we attach similar information about the red
, green
and blue
constructor params?
With reference to your suggested methods above (CONSTRUCT
, NEW
, ADJUST
) would you be able to find any better answers on this? Specifically, where red/green/blue are handled, where any metadata about them might be declared (currently we have nowhere), and where the unrecognised cyan
param would be complained about.
With reference to your suggested methods above (CONSTRUCT, NEW, ADJUST) would you be able to find any better answers on this? Specifically, where red/green/blue are handled, where any metadata about them might be declared (currently we have nowhere), and where the unrecognised cyan param would be complained about.
I'm still not seeing the problem. Currently, this can be done in Moose with MooseX::StrictConstructor
and BUILDARGS
. The lack of metadata about the documented arguments is either:
So in the latter case:
class Widget {
field $title :param;
field ($red, $greed, $blue ) :param;
field $colour;
ADJUST ($params) {
$colour = Colour->new( delete %{$params}{qw( red green blue )} );
}
}
my $widget = Widget->new( title => "The text", red => 100, green => 30, blue => 0 );
Note that the colors don't have readers or writers, as per your original code, so I think this satisfies that and it's more correct because the user must supply them. (You can always return undef
from an initializer block if they're optional).
If my code sample doesn't address your issue, can you help me understand why? If this is solvable under Moose but not Corinna, can you show me a Moose example?
The issue with your code example is that the "fields" $red
, $green
and $blue
are just waste after the construction of the object, but can't be disposed of. This is not a showstopper, but I consider it a minor flaw.
As for the comparison with BUILDARGS
: This will get more pronounced as soon as inheritance comes into play. In Moose, BUILDARGS
is a method, thus can be chained to call the parent's BUILDARGS
whenever it sees fit. Also, BUILDARGS
can be used to define arbitrary constructor APIs, not just key/value pairs. I think we might need more procedural control in ADJUST
blocks, too.
Here's an example which can be made to work with Moose, but not with Object::Pad:
use Object::Pad;
class Complex {
field $real :param;
field $imaginary :param;
}
class Real :isa(Complex) {
ADJUST {
# Here, I should be able to set $imaginary to 0. But how?
# $imaginary isn't available, and setting imaginary => 0
# has no place to set it
}
}
my $real = Real->new(real => 123);
# Required parameter 'imaginary' is missing for Complex constructor
The issue with your code example is that the "fields"
$red
,$green
and$blue
are just waste after the construction of the object, but can't be disposed of. This is not a showstopper, but I consider it a minor flaw.
It could be considered a major flaw. Right now in this particular example they're just numbers being the colour components; so it's a fairly harmless problem. But consider if those were large object references themselves. Those dead fields needlessly holding them are going to inflate the refcount, delaying their destruction. That at least wastes memory and depending on what other behaviours might be attached to destruction time of those objects, might actually be a bug, if some other code was relying on timely destruction of them.
We really mustn't store construction-time parameters as fields unless we intend to store them as fields. Using the fields just as temporary storage during construction time and then not touching them afterwards isn't really a good solution.
If my code sample doesn't address your issue, can you help me understand why?
@HaraldJoerg mostly explained that in the reply; I expanded on it above.
If this is solvable under Moose but not Corinna, can you show me a Moose example?
Well I don't know Moose anywhere near enough to suggest how to do it there, but consider this in classical perl:
package Widget;
sub new ($class, %params) {
my $title = delete $params{title};
my $colour = Colour->new( delete %params{qw( red green blue )} );
die "Unrecognised construction parameters - " . join(", ", sort keys %params) if keys %params;
my $self = bless {
title => $title,
colour => $colour,
}, $class;
return $self;
}
We haven't pointlessly stored the plain red
, green
or blue
values, other than by forwarding them to the constructor of the Colour
instance. We've still complained to our caller about any other unrecognised construction-time params.
@HaraldJoerg
Here's an example which can be made to work with Moose, but not with Object::Pad:
Ah, that example is showing off an equally-valid, yet quite different problem. That's more about how subclasses can alter the definitions of existing fields on their parent classes. It's quite common in Moose for example to alter an existing field; in your example case:
has '+imaginary' => default => sub { 0 };
The +
symbol says not to add a new field here, but instead we're altering the definition of an existing one for members of this subclass.
For your usecase, it's as if we'd want to allow something like:
class Complex {
field $re :param :reader :writer;
field $im :param :reader :writer;
...
}
class Real :isa(Complex) {
inherited field $im { 0 }; # give it a default value
}
though ideally you'd want to try to remove the ":param" part of it; making it a construction-time error to explicitly pass it.
However, that being the case it does sortof suggest at least this example, saying that a Real number is a subclass of a Complex one, is perhaps not the best example. What should be the correct behaviour of
my $z = Real->new( re => 4, im => 3 );
?
@leonerd:
my $z = Real->new( re => 4, im => 3 );
That should be an error. I could live with a sloppy implementation where im
is silently ignored, but not with one which would accept the value of 3.
I might be able to come up with a more complex, but more relevant example at some time.
I can not quote any relevant literature on that, but in my belief a class is responsible for the complete API of its constructor. Saying "ah, and in addition you can pass all parameters of the base class" is frequent practice and sort of acceptable if you don't want to duplicate documentation, but not correct. The fact that class Real
is based on Complex
is an implementation detail, a user of Real
should not need to know about this.
So, some instance within class Real
should be able to set up the construction parameters for its base class (and also, set up initialization parameters for every role it :does
). If we have an explicit mechanism for that, then we can also solve the problem of conflicting :param
names in base classes and roles. Unlike Moose, we do not have the problem that everything needs to live in one single hash after construction, roles and parent classes have their own spaces for fields!
The issue with your code example is that the "fields"
$red
,$green
and$blue
are just waste after the construction of the object, but can't be disposed of. This is not a showstopper, but I consider it a minor flaw.It could be considered a major flaw. Right now in this particular example they're just numbers being the colour components; so it's a fairly harmless problem. But consider if those were large object references themselves. Those dead fields needlessly holding them are going to inflate the refcount, delaying their destruction. That at least wastes memory and depending on what other behaviours might be attached to destruction time of those objects, might actually be a bug, if some other code was relying on timely destruction of them.
Now I see your concern, but I don't see it as a "major flaw" in terms of Corinna. It's a major flaw in terms of the general expressiveness of most OO systems. With those systems that define a constructor as the name of the class, you could do this (pseudo-code):
class Widget {
field $title :param;
field ($red, $greed, $blue ) :param;
field $colour;
method Widget :constructor ($params) {
$colour = Colour->new( delete %{$params}{qw( red green blue )} );
}
}
For those types of OO systems (e.g., Java), no field
would have a value without being explicitly assigned in the Widget
constructor. The problem goes away.
With current Corinna syntax, it can still be fixed (keeping in mind that ADJUST
is called after the constructor:
class Widget {
field $title :param;
field ($red, $green, $blue ) :param;
field $colour;
ADJUST ($params) {
$colour = Colour->new( delete %{$params}{qw( red green blue )} );
undef $_ foreach $red, $green, $blue;
}
}
In the above, we get undefine those values, but the metadata is still available. That solves the issue, yes?
I think (I could be wrong) that this is an edge case that's less likely to be a serious issue, but it's easy to solve if it arrives. I acknowledge that you might write code that often hits this issue. I'm unsure of the last time that I have. However, I do admit that I've written plenty of code where arguments are required for construction, but not post-construction.
However, that being the case it does sort of suggest at least this example, saying that a Real number is a subclass of a Complex one, is perhaps not the best example. What should be the correct behaviour of
my $z = Real->new( re => 4, im => 3 );
Well, that's annoying 😃 I'm not sure of the best approach here.
In classical Perl terms, the constructor for Real
might look like this:
sub new ( $class, %args ) {
my @keys = keys %args;
if ( 1 == @keys && 're' eq $keys[0] ) {
return bless $class->SUPER::new(%args, im => 0), $class;
}
croak("...");
}
In Moose, I think this would work:
has '+im' => (
default => 0,
init_arg => undef,
);
You'd probably want MooseX::StrictConstructor
with that.
So for Corinna, possibly something like this?
class Complex {
field $re :param :reader :writer;
field $im :param :reader :writer;
...
}
class Real :isa(Complex) {
field $im :overrides { 0 }; # give it a default value
}
If we wanted $im
to remove the :writer
, we don't have a syntax for that. But Liskov tells us that any place we have an instance of a class, we should be able to substitute an instance of the subclass and have it work. So we'd have to inherit all of the parent field
behavior except if we have :overrides
, it removes the :param
for its definition (leaving all other attributes intact, unless otherwise specified) and if we wanted that as a :param
, we'd have to add that attribute back in explicitly.
Or we could say that :overrides
removes all parent attributes from the field
and if we want to respect Liskov, we have to manually add them back: field $im :overrides :reader :reader {0};
That's probably a cleaner solution, even if it means that Liskov will often be unhappy if we copy the attributes incorrectly.
If we wanted
$im
to remove the:writer
, we don't have a syntax for that.
I could argue that the pure existence of the writer makes the class unsuitable for subclassing with a class where the value is supposed to stay fixed. As the author of a subclass I have to check the complete contract of the parent class. If I want to subclass anyway, I have an explicit way to express that. Depending on the actual situation, either of these implementations might be appropriate:
method set_im :override ($whatever) {
die "Reality alert: You promised this would not happen for this Real object.";
}
method set_im :override ($whatever) {
$whatever and warn "Enhanced reality mode active for this Real object."
}
I think it is totally acceptable if cases like this can not be solved in a declarative way.
But also: We digress :) The auto-generation of readers and writers is a really nice convenience feature, but unrelated to object construction.
The Real
vs Complex
issue looks like it would be one solved by leveraging CONSTRUCT
rather than ADJUST
. (Fortunately, CONSTRUCT is still relevant to constructor params!) CONSTRUCT
looks to me to be the phaser/method that occurs at exactly the right time to give the idea that the top-level class controls the shape of new
. To that end I would suggest that only the top-level class's CONSTRUCT is called by default: no other class or role runs CONSTRUCT phases unless the toplevel class explicitly calls them. (This is, incidentally, currently how Object::Pad
's BUILDARGS is working.)
Consider, for example, this implementation of Complex and Real:
class Complex {
field $re :param :reader;
field $im :param :reader;
CONSTRUCT { # Not called by Real::CONSTRUCT
if (@_ == 1) { die unless shift() =~ /^(.*)[+-](.*)i/i; return re => $1, im => $2; }
}
}
class Real :isa(Complex) {
# Remove im as a constructor parameter.
CONSTRUCT {
if (@_ == 1) { return re => shift(), im => 0; }
my %args = @_;
exists $args{im} and die;
$args{im} = 0;
return %args;
}
method im :override () { 0; }
}
To make this work would require changing the specification to say that only the top-most class's CONSTRUCT phase is called. (Note that Complex's CONSTRUCT dies. This means Real's attempt to rewrite could only work if Real is the first (and possibly only) class to receive CONSTRUCT.) That phase is responsible for making the named argument list look like whatever it needs to look like for the benefit of itself and also its superclass(es) and role(s). To that end, it should have the option of being able to explicitly call superclass/role CONSTRUCT phases directly, and dispensing with the returned argument list as it sees fit (even as far as saying it's on the caller to deal with an odd-sized list). E.g. %args = (%args, $self->ParentClass::CONSTRUCT(...));
(The idea that a phase called CONSTRUCT
exists to reshape the new
argument list and ADJUST
for second-stage field initializing is starting to seem a little weird to me the more I keep typing this. If we go this route, perhaps we should consider reversing those names?)
Speaking of, I mentioned previously the idea that we could allow "constructor-only fields" (fields that may or may not be :param
, but are only used in field initializers and ADJUST blocks) to be released from an object when the constructor finishes. This would alleviate the "held reference" problem @leonerd mentioned previously, and it would have an interesting secondary effect: in the above Real
class, by overriding $im
's :reader
with a basic method that doesn't reference $im
at all, $im
could theoretically be dropped from Real
objects (unless of course something like $r->Complex::im()
is a thing some jerk could do), making a Real
take up less space than Complex
.
I'm planning a much (much much) longer response to @leonerd's original question, with what I hope may be a clean and acceptable solution to the various issues raised in this discussion.
But, for now, just a quick word about Square
inheriting Rectangle
or Real
inheriting Complex
.
Unless the classes in question produce immutable objects, these are the classic examples of invalid OO design. They shouldn't work, and we shouldn't help them to work.
Note, however, that the solution to @leonerd's original question that I will be proposing (later today, I hope!) will also cleanly handle this inheritance problem correctly ...at least for immutable objects.
Unsurprisingly, I have observations and comments on this issue. I apologize in advance for their extent. ;-)
ADJUST
(and whatever mechanism is provided to pre-tweak constructor args)
both really need to be phasers, not methods. If they’re methods, the methods
will appear to be magical (being auto-redispatched up/down the hierarchy).
Phasers are called automatically, in specific sequences, so there’s nothing
magic about ADJUST
doing that too.
Furthermore, if they’re methods, they will pollute the class’s API and could also be called explicitly after construction, which is almost certainly a Bad Idea.
The current design allows us to fake an argument-list pre-tweaking behaviour via something like:
method new_other :common (@args) {
return $class->new( tweak_those(@args) );
}
This is not a great solution, however, as nothing marks these methods as being constructor-ish. Except perhaps the naming convention...which is inherently fragile, and not helpful to automated code analysis tools.
Object::Pad currently offers the BUILDARGS()
method, which “lurks inside”
any call to new()
(a good thing!) and is automatically pre-applied to
the constructor args (another good thing!), but which also requires
explicit redispatch to ensure that any inherited BUILDARGS()
also get
called. This is inherently fragile and pollutes the API (as above).
Like every other aspect of automated construction, it would be cleaner
and safer as a phaser instead.
More generally, though, allowing more than one class to pre-tweak
the arguments of new()
introduces complications and opportunities
for enbugging the class hierarchy.
It also increases the coupling between classes in a hierarchy, which means that those enbuggings can be extremely hard to detect or correct, and can affect distant and seemingly unrelated code.
That’s why we originally designed Corinna’s constructors to accept only a properly formatted key/value list, from which every class up the hierarchy then extracts its own initializer values, by name.
Given all those considerations, I believe that any argument
pre-tweaking mechanism needs to be integrated directly “behind”
new()
(for affordance), but must be a phaser (for robustness).
Moreover, only the most derived pre-tweaking mechanism should ever be invoked, and it should be incapable of redispatch (another advantage of making it a phaser).
That mechanism should also be required to return a properly formatted key/value list, which the hierarchical initialization mechanism can then use (or complain about, if it’s still wrong).
The other requirement in this issue is a way to declaratively
specify all expected/required/optional named constructor args
(hereafter “NCAs”), and automatically track whether they have
been “consumed” by the various automatic initialization mechanisms
(i.e. by field initializer blocks or by ADJUST
blocks).
This is necessary so that the constructor can complain about required NCAs that are missing and/or about unexpected extra NCAs.
With regard to tracking unexpected/unused constructor arguments,
I strongly agree that we need to find a way to make this feature
fully automatic and declarative. If argument tracking requires the
user code to manually delete entries from some parameter hash in
an ADJUST
and/or a field initializer block...that is inherently
fragile and will lead to endless misery amongst everyday developers.
We definitely need a better answer than that.
Given all of the above, here’s my list of design requirements for object initialization:
All initialization syntax must be as declarative as possible.
As much behaviour as possible must also be automatic (i.e. does not have to be explicitly coded by the user).
Named constructor arguments (“NCAs”) must be automatically marked as having been “consumed” by the initializer in all cases. It must not be possible to accidentally manually mark or unmark a “consumed” NCA.
It must be also possible for an ADJUST
or a field initializer block
to access an NCA without it being automatically marked as “consumed”
(so that two or more initializer blocks can both access that NCA,
with only one of them actually “consuming” it).
Which NCA(s) a given field “consumes” must be inferable by humans, compilers, IDEs, and other tools...purely from static syntax.
Initializer blocks should be a kind of implicit ADJUST
phaser.
That is, they should both use exactly the same underlying mechanism,
merely providing different syntactic affordances to it.
My thinking for this design was as follows: If all initialization options
need to be declarative and automatic, then they all need to be hooked
onto the declarations of automated components. Specifically: hooked
onto the declarations of fields and ADJUST
phasers.
So far, in Corinna, we’ve done that by the clever trick of using the name of a field variable to determine the name of its associated named constructor argument, as well as the names of any autogenerated accessors for that field.
So why not use the same trick in reverse? Why not associate specific named constructor arguments with identically named variables that have been declared as part of initialization blocks and phasers?
In other words: if field initializer blocks and ADJUST
phasers
had parameters, like subs and methods so, then the names of
those parameter variables could tell the initialization process
which NCAs to pass into each initialization block.
Better still, those parameter names would also declare (statically, in the source code) that the corresponding named constructor arguments should be marked as having been “consumed” by that initialization block.
The only problem is that, in current Perl syntax, raw blocks and phaser blocks can’t have parameters. :-(
So we simply change that reality...
The BUILDARGS
method becomes a phaser...and maybe gets a better,
less-Moosey name. @aquanight referred to something similar
as CONSTRUCT
, but I think that name’s misleading. This phaser
doesn’t construct the object; it simply provides the
proper key/value list of named arguments that the underlying
constructor was expecting to get from new()
.
So, in this proposal I’ll name it: NEWARGS
.
(Though, to be honest, I’m not strongly attached to,
or even particularly happy with, that name.)
Multiple classes in a hierarchy can each have a NEWARGS
phaser,
but only the most-derived available NEWARGS
phaser is ever called.
And because they’re phasers, there’s no way to redispatch to an
ancestral NEWARGS
. (This is a feature, not a bug!)
In practice, this means that any call to any NEWARGS
must
return a key/value list that is suitable for initializing
the fields of its own class, and those of all ancestral classes
as well.
This makes each class entirely responsible for adapting
its chosen new()
API(s) to the standard key/value NCA format
automatically accepted by all of its ancestors. This maximizes
class independence and minimizes undesirable coupling between
classes within a hierarchy.
Unlike other Perl phasers (so far!), the NEWARGS
phaser
must be defined with a signature/parameter list,
which is placed between the keyword and the block.
This parameter list uses the same syntax and provides
the same features as modern Perl subroutine signatures.
For example:
NEWARGS
($x, $y, $z, @etc)
{ tweak_those_args_here() }
The automatically provided new()
constructor can now accept
any list of arguments, but it immediately calls the
most-derived NEWARGS
(if any), binding new()
’s
argument list to the parameters of that NEWARGS
.
Or, to look at it another way, the signature of a class's NEWARGS
constrains the signature of the class's new()
method.
After the NEWARGS
block executes, the key/value list
it produces is then passed on to the initialization process.
If there was no available NEWARGS
phaser to invoke,
the original constructor argument list is passed along
unchanged.
In order to produce its key/value list, NEWARGS
must arrange
for that list to be the final evaluated expression in its block.
Because, despite having a parameter list, a NEWARGS
phaser
is still a block, not a subroutine, and you cannot use
a return
in it. In fact, a return
in a NEWARGS
block
should be flagged as a fatal compile-time error.
Runtime errors within a NEWARGS
– for example: missing arguments,
or a failure to extract a suitable key/value list – can be signaled
by throwing an exception within the phaser block.
Note that, because they cannot use return
statements,
NEWARGS
phasers are inevitably going to be structured
as tedious if
...elsif
...else
cascades, as can be seen
in the first few examples below.
This infelicity could be avoided if classes were allowed to define
multiple NEWARGS
phasers, but with only one of them
is ever called for any given construction. Specifically,
the initialization process would execute only the first
NEWARGS
phaser whose signature successfully binds
to the actual argument list that was passed to the
original call to new
.
See Speculative examples (if multiple NEWARGS were supported) below for why this would be a markedly superior approach. (And, yes, this would require a kind of limited multiple dispatch, so I don't actually expect we'll get it, despite its manifest superiority. Breathe, @leonerd, breathe! ;-)
ADJUST
phasersThe ADJUST
phaser now allows for an optional signature/parameter list,
specified between the keyword and the block, with the same
syntax and features as Perl subroutine signatures. For example:
ADJUST
($size)
{ $max_size = $size < 0 ? $DEFAULT_SIZE : $size }
When a parameterized ADJUST
is invoked, each of its named scalar
parameter variables is bound to the corresponding named constructor
argument, and that NCA is then marked as having been “consumed”.
So, the above example would mark the 'size'
NCA as “consumed”.
If there is no suitable NCA for a parameter, it is a fatal
error: Missing 'size' argument in call to new()...
unless that parameter was also specified with a default value,
which is then used instead. For example:
ADJUST ($size
= 10
) { $max_size = $size < 0 ? $DEFAULT_SIZE : $size }
If the parameter list for the ADJUST
ends in a slurpy array
or hash, all other named constructor arguments (i.e. that were not
bound to any preceding scalar parameter) are copied into that final
slurp parameter. These extra arguments are not marked as
having been “consumed”. This allows an ADJUST
to access named
constructor arguments that “belong” to (i.e. are formally
“consumed” by) other initializers.
Having received the appropriate named constructor arguments
via its parameter list, the ADJUST
block can then use
those parameter values to initialize any field variable currently
in scope (or, indeed, to perform any other appropriate action
or side-effect).
Ideally, a given class could have as many distinct ADJUST
phasers
as the user wished, with each of them being called in turn and
being passed the appropriate subset of constructor arguments,
as determined by their individual signatures.
As mentioned earlier, the signature/parameter list on an
ADJUST
phaser is entirely optional. ADJUST
phasers specified
without a signature receive no constructor arguments and
do not mark any as “consumed”.
Such unparameterized ADJUST
phasers are still useful.
For example, to verify post-initialization object integrity.
The optional initializer block for a field also needs a way to declare a signature/parameter list, so as to provide it with access to the appropriate named constructor argument(s).
Unfortunately, whilst there is a feasible syntactic slot for adding signatures to the phaser syntax (i.e. between its keyword and its block), there is no feasible syntactic slot for adding a signature to a bare Perl block.
Rather than invent a special parameter syntax just for field initializer blocks,
I propose that we add an extra attribute to field
declarations,
within which the (optional) signature of the field’s initializer block
may be declared.
Specifically, we add an :adjust(...)
attribute, in whose parens
the signature for the field initializer block is placed.
For example:
field $title
:adjust($label)
{ ucfirst lc $label }
This would be exactly equivalent to:
field $title;
ADJUST ($label)
{
$title =
ucfirst lc $label }
Just as with a parameterized ADJUST
, if a field is declared with
an :adjust(...)
attribute, when the field’s initializer block is
executed the appropriate named constructor argument would be marked
as having been "consumed”. Hence, in the above example, the presence
of :adjust($label)
means that the value of the 'label'
constructor
argument would be passed into the field’s initializer block
and the 'label
constructor argument would also be marked as
having been “consumed”.
Apart from the DRY-ness of not having to repeat the field variable
within the block, the advantage of using an :adjust($name)
attribute,
rather than an ADJUST
phaser, is that the attribute directly
associates the 'label'
constructor argument with the $title
field,
allowing the object initialization process, (not to mention humans,
IDEs, and other software entities) to easily detect this association.
Just like an ADJUST
block, an :adjust(...)
can specify two or more
scalar parameters, with optional default values, as well as a final
slurpy array or hash. All those parameters are then available in the
field's initializer block, and all the corresponding named constructor
arguments are marked as having been used (except, of course, for any
that end up in the trailing slurpy).
If a field specifies both a :param
and an :adjust(...)
, then the
direct assignment of the identically named constructor argument
(as implicitly specified by the :param
) is attempted first,
and the :adjust(...)
and initializer block are ignored,
unless there is no suitable named constructor argument for
the :param
. Only those named constructor arguments that are
actually used (i.e. either by the :param
or else by the
:adjust(...)
) are marked as “consumed”.
If a field specifies an initializer block but no :adjust(...)
attribute, the initializer block is passed no constructor args,
and does not mark any as “consumed”.
Phew! That’s an awful lot to assimilate.
Let’s see if a few examples can make these ideas clearer...
class Widget {
NEWARGS (@args) {
# Allow Widget->new( $title, $rgb )...
if (@args == 2) {
(title=>$args[0], red=>$args[1], green=>$args[1], blue=>$args[1]);
}
# Allow Widget->new( $title, $r, $g, $b )...
elsif (@args == 4) {
(title=>$args[0], red=>$args[1], green=>$args[2], blue=>$args[3]);
}
# Otherwise expect Widget->new( %named_args )...
else {
return @args;
}
}
# Automatic initialization from 'title' constructor arg...
field $title :param;
# This field is NOT automatically initialized....
field $colour;
# Pull out the 'red', 'green', and 'blue' named constructor args
# and initialize the $colour field with a Colour object...
ADJUST ($red, $green, $blue) {
$colour = Colour->new($red, $green, $blue);
}
# Pull out the 'label' named constructor arg to initialize this field,
# or else use the value of the earlier $title field, if no 'label' arg provided...
field $ID :adjust($label = $title) { $label =~ s/\W/_/gr }
}
# Exercise the class...
$widget1 = Widget->new( title => "The text", red => 100, green => 30, blue => 0 );
$widget2 = Widget->new( "The text", 100, 30, 0 );
$widget3 = Widget->new( "The text", 0 );
$widget4 = Widget->new( "The text" ); # ...DIES
class Widget {
field $title :param;
# The :param will first try to initialize $colour via a 'colour' named constructor arg.
# Otherwise, the :adjust(...) will try to initialize via 'r'/'g'/'b'/ NCAs...
field $colour :param :adjust($r, $g, $b) { Colour->new($r, $g, $b) };
}
# Specify individual colour components directly...
my $widget1 = Widget->new( title => "The text", r => 100, g => 30, b => 0 );
# Specify colour as a pre-build Colour object...
my $widget2 = Widget->new( title => "The text", colour => Colour->new(100, 30, 0) );
# Error because no NCA to initialize $colour field...
# ("Missing arguments 'colour' or 'r', 'g', and 'b' in call to new()...")
my $widget2 = Widget->new( title => "The text" );
# Error because 'colour' NCA will pre-empt 'r'/'g'/'b', leaving them "unconsumed"
# ("Unexpected arguments 'r', 'g', and 'b' in call to new()...")
my $widget2 = Widget->new( title => "The text",
colour => Colour->new(100, 30, 0),
r=>0, g=>0, b=>0);
class Complex {
# Immutable object state (or this would be a crime against Liskov!)...
field $re :param :reader;
field $im :param :reader;
NEWARGS ($N, @etc) {
@etc > 0 ? ($N, @etc) # ->new( re=>1, im=>2 )
: $N =~ /^(?<re>.*)[+-](?<im>.*)i$/i ? (%+) # ->new( '1+2i' )
: die "Unexpected arg: $N"; # ->new( 'snorkel' )
}
}
class Real :isa(Complex) {
NEWARGS (@args) {
if (@args == 1) {
(re => $args[0], im => 0);
}
else {
my %args = @args;
die "Can't specify imaginary component for real number"
if exists $args{im};
(%args, im=>0);
}
}
}
class Widget {
NEWARGS ($title, $rgb, ) { title=>$title, red=>$rgb, green=>$rgb, blue=>$rgb }
NEWARGS ($title, $r, $g, $b) { title=>$title, red=>$r, green=>$g, blue=>$b }
NEWARGS (%named_args) { %named_args }
field $title :param;
field $colour;
ADJUST ($red, $green, $blue) { $colour = Colour->new($red, $green, $blue) }
field $ID :adjust($label = $title) { $label =~ s/\W/_/gr }
}
class Complex {
field $re :param :reader;
field $im :param :reader;
NEWARGS ($num) { $num =~ /^(?<re>.*)[+-](?<im>.*)i$/i ? %+ : die }
NEWARGS (%args) { %args }
}
class Real :isa(Complex) {
NEWARGS ($num) { re => $num, im => 0 }
NEWARGS (%args) { die if exists $args{im}; (%args, im=>0); }
}
Well, that’s what I’d like to see, anyway.
It’s certainly not perfect, but I think it cleanly balances power and safety, and I believe that it covers most of @leonerd’s issues and desiderata, without mangling Perl syntax too badly.
I am, of course, as always, more than happy to answer any questions, clarify any mysteries, fill in any omissions, debate any controversies, or brainstorm any possible improvements to this proposal.
Thanks for coming to my TED talk. :-)
The current design allows us to fake an argument-list pre-tweaking behaviour via something like:
method new_other :common (@args) { return $class->new( tweak_those(@args) ); }
This is not a great solution, however, as nothing marks these methods as being constructor-ish. Except perhaps the naming convention...which is inherently fragile, and not helpful to automated code analysis tools.
Agreed, but this could be done:
method create :new (@args) {
return $class->new( tweak_those(@args) );
}
That would automatically be a class method and would be expected to return an instance of something. This could be useful for factory classes where it's clear that what is returned is not an instance of $class
.
That being said, I'm not a fan of a proliferation of different ways of naming constructors. The multi-method approach you describe seems warranted.
I'm also not a fan of the name NEWARGS
, though I can't think of anything better.
That being said, I'm not a fan of a proliferation of different ways of naming constructors.
Yes, I think there's much to be said for avoiding the potential Babel of:
$obj = MyClass->new(%args);
$obj = YourClass->new(red=>$r, green=>$g, blue=>$b);
$obj = YourClass->new_from_rgb($r, $g, $b);
$obj = YourClass->new_from_colour($colourobj);
$obj = TheirClass->obj(@args);
$obj = TheirClass->make(@args);
$obj = TheirClass->reify(@args);
$obj = TheirClass->create(@args);
$obj = TheirClass->realize(@args);
$obj = TheirClass->manifest(@args);
$obj = TheirClass->actualize(@args);
$obj = TheirClass->instantiate(@args);
$obj = TheirClass->corporealize(@args);
Or, to put it another way, a :new
attribute would certainly help the compiler, the IDE, and other tools
to understand the method's intended purpose, but might actually make things worse for humans.
I'm also not a fan of the name NEWARGS, though I can't think of anything better.
BUILDARGS
is better, or perhaps even INITARGS
, because this phaser
builds the required arguments for the initialization process. I just worry that
BUILDARGS
is already saddled with too many Moose-y associations/preconceptions.
And that INITARGS
is ungainly and a little obscure.
Arguably, this phaser implements a user-redefinable interface for the signature of new()
,
so it could possibly be named: NEWSIG
But none of these really ”sparks“ for me, so I left it at NEWARGS
,
which is sufficiently infelicitous as to encourage us all to strive
for something better. ;-)
Wow, much good thought here I see. I'll respond to a few quick bits to get them out of my head but I will write up a longer response later too.
All of @thoughtstream's "Preliminary Thoughts" look good. Nothing I disagree with there.
Of the design itself: It's far from a "no", but I am slightly cautious about trying to add too much meaning to the names of signature parameters. Not so much because it's a bad idea here, but simply that if we do it here we should consider wider Perl language overall and what it might mean to do similar things elsewhere. It may well be something we decide we want to do elsewhere too, but if we do we should aim to make something consistent across the whole language.
I could easily live with multi NEWARGS(...)
support in S:K:MultiSub. I think it would be quite neat and powerful as an extension of the other multi-sub thoughts actually.
@leonerd wrote:
I am slightly cautious about trying to add too much meaning to the names of signature parameters.
Caution is always a wise starting position. :-)
Not so much because it's a bad idea here, but simply that if we do it here we should consider wider Perl language overall and what it might mean to do similar things elsewhere. It may well be something we decide we want to do elsewhere too, but if we do we should aim to make something consistent across the whole language.
The consistent thing (that we’re having to work around here, because Perl doesn’t have it) would be universal support for parameters being bound by name rather than by position.
In Raku we can say:
sub foo (:$r, :$g, :$b) {...} # Named parameters...
foo(b=>1, r=>2, g=>3); # ...are bound to named arguments
...and each value ends up bound the the correct parameter, despite the args not being passed in the same order as their appropriate params.
That’s effectively what I’m suggesting for ADJUST (...)
, and :adjust(...)
,
except that I’m not proposing a special parameter syntax to specify “by name”
rather than “by position” binding. Instead, I'm suggesting that these constructs
just have that behaviour as a special (magic) case.
Of course, it would be vastly better if this didn’t need to be a special case. But that implies a rather significant addition to the Perl signature syntax and semantics. Which I was extremely reluctant to suggest, given how hard it’s been so far to get general buy-in and approval for adding any subset of Corinna to Perl.
If Perl already had named parameter binding, then the solutions I would have proposed might have been something like:
ADJUST (red=>$r, green=>$g, blue=>$b) { # Hypothetical named param syntax
$colour = Colour->new($r, $g, $b);
}
field $ID :adjust(label => $l = $title) { $l =~ s/\W/_/gr }
In which case those “match the named constructor arg to the identically named adjuster parameter” behaviours would no longer be special magic, but just the regular everyday semantics of “bind by name” parameters.
I could easily live with multi NEWARGS(...) support in S:K:MultiSub. I think it would be quite neat and powerful as an extension of the other multi-sub thoughts actually.
That, of course, is the other thing I was trying to work around in my proposal: the complete lack of multiple dispatch in standard Perl.
If Perl already supported multimethods, we wouldn’t need NEWARGS
at all.
We could simply specify that the automatically supplied default new()
method
is always implemented as a multi
.
Then, if class designers wanted to support alternative constructor signatures,
they would simply add extra variants of new()
, each of which would simply
redispatch to the default new
:
class Widget {
multi method new ($title, $rgb) {
$class->new( title=>$title, red=>$rgb, green=>$rgb, blue=>$rgb);
}
multi method new ($title, $r, $g, $b) {
$class->new( title=>$title, red=>$r, green=>$g, blue=>$b );
}
...
}
class Complex {
field $re :param :reader;
field $im :param :reader;
multi method new ($num) {
die if $num !~ /^(?<re>.*)[+-](?<im>.*)i$/i;
$class->new(%+);
}
}
class Real :isa(Complex) {
multi method new ($num) { $class->new(re => $num, im => 0) }
}
(Though, to be honest, the NEWARGS
phaser mechanism I proposed
is still far more declarative and automatic that the above alternative.)
Alas, Perl has neither “bound-by-name” parameters, nor multimethods. So I had to colour within the lines of existing Perl constructs and syntax. Which led directly to those pesky special cases and behaviours in proposal I offered. That’s kinda what I meant when I said: “It’s not perfect...”
But given the improbability of convincing the Powers That Be to add either named parameter binding or multiple dispatch to Perl, I feared that in this case Perfect might well be the enemy of Good (and, worse still, the fatal nemesis of Approved ).
Alas, Perl has neither “bound-by-name” parameters, nor multimethods. So I had to colour within the lines of existing Perl constructs and syntax.
At the risk of being controversial by not being cautious, @leonerd has written Syntax::Keyword::Multsub which gives us multiple dispatch. There are a couple of bugs which need to be resolved, but those don't look insurmountable.
I would, however, prefer sticking to the KIM syntax because multi
is an adjective, not a noun.
method new :multi ($num) { ... }
Evolving Perl to stick with this syntax will make it more consistent and predictable over time. I suspect this would also make Paul's proposal for a metaprogramming API a touch easier.
(Though, to be honest, the NEWARGS phaser mechanism I proposed is still far more declarative and automatic that the above alternative.)
True, but multi-method constructors seem (to me) to be easier to understand because they omit some of the complexity and having :multi
is extremely useful outside of this one context.
@ovid observed:
At the risk of being controversial by not being cautious, @leonerd has written Syntax::Keyword::MultiSub which gives us multiple dispatch. There are a couple of bugs which need to be resolved, but those don't look insurmountable.
My understanding is that Syntax::Keyword::Multsub
currently only provides
multiple dispatch for sub
declarations. In this context, I believe
we would need method
declarations instead.
BTW, ever since a much earlier private discussion with @ovid on the need for
multimethods in Corinna, I too have been working on a multi
implementation
(a module called Multi::Dispatch
).
In contrast to @leonerd’s wisely cautious and elegantly minimalist approach, my version is wildly ambitious, absurdly overpowered, and includes every conceivable MD feature culled from half a dozen other high-level languages, plus at least three additional kitchen sinks of my own devising. Including, as it happens, multimethods.
It is a pure Perl implementation, so it runs appreciably slower that @leonerd’s XS-driven version. Though, at just under 1 million dispatches-per-second on my M1 Powerbook, it’s certainly no slouch and definitely Real World usable.
(Not entirely) coincidentally, Multi::Dispatch
also solves the named parameter
binding problem, because one of its kitchen sinks is full strict destructuring
of keyed values within hashlike parameters.
I had been planning to unleash this module on an unsuspecting Perl world some time in the next month or two, but would be happy to offer @ovid and @leonerd a sneak preview in the next few weeks, if either of them were interested.
However, despite all that, I am not really convinced that either
Syntax::Keyword::Multsub
or Multi::Dispatch
are the right solutions here.
See below.
I would, however, prefer sticking to the KIM syntax because multi is an adjective, not a noun.
method new :multi ($num) { ... }
Evolving Perl to stick with this syntax will make it more consistent and predictable over time. I suspect this would also make Paul's proposal for a metaprogramming API a touch easier.
I can’t speak to the meta-API issue, and despite my previous championing of KIM syntax, I actually think KIM would be wrong here...and, more especially, wrong for a hypothetical generalized Perl-wide multiple-dispatch mechanism.
Multisubs and multimethods are fundamentally different entities from subs and methods, with fundamentally different declaration constraints and fundamentally different dispatch behaviours. They even have (or, at least, should have) fundamentally different internal control constructs (specifically for redispatch); constructs that ought to be illegal in regular subs and methods.
In short, they are sufficiently distinct that I believe they need a separate keyword.
Not a prefix modifier (like Syntax::Keyword::MultiSub
provides), but actual distinct
declarators. In Multi::Dispatch
after much soul-searching, and many trials-and-errors,
I eventually chose multi
(for MD subs) and multimethod
(for MD methods).
But, once again, I fear we are straying from the main issue here. When @ovid begins his campaign to add MD to Perl (breathe, @ovid, breathe! ;-) I will be more than happy to argue KIM-vs-keyword with him at much greater length.
But multi-method constructors seem (to me) to be easier to understand because they omit some of the complexity
Agreed. Though they also thereby omit some of the safety and robustness and convenience and syntactic mellifluousness of an automated declarative mechanism.
and having :multi is extremely useful outside of this one context.
This to me is the clincher. Multiple dispatch is indeed tremendously useful in other contexts (including other in-class contexts) and it would be ideal if Perl were to support it.
However, I fear that designing, marketing, selling, and then implementing a full MD mechanism for Perl is going to be even more controversial, more challenging, and more “illegitimi carborundum” than shepherding through this wonderful new OO mechanism has proved to be. :-(
Hence, I suspect it will be some considerable time, if ever, before we can rely on a built-in MD solution to solve this problem.
And, with the most genuinely profound respect to @leonerd (and somewhat less respect to myself), I do not believe that we can – or should – rely on CPAN solutions for issues like this that we discover within the current design of Corinna.
Happily, in this particular case, multimethods are not actually required.
If we decide that the NEWARGS
approach is simultaneously both too complex
and too limited in scope, then we should just shelve the entire idea of
pre-tweaking named constructor arguments...at least for the time being.
Instead, we can mandate that the constructor argument API for Corinna
is fixed and immutable (new
must always be passed a key/value list)
and then users who feel they need some other API can just supply their own.
Either via a singly dispatched new
method with an internal if
cascade:
class Widget {
method new :common (@args) {
if (@args == 2) {
my ($title, $rgb) = @args;
$class->next::method(title=>$title, red=>$rgb, green=>$rgb, blue=>$rgb);
}
elsif (@args == 4) {
my ($title, $r, $g, $b) = @args;
$class->next::method(title=>$title, red=>$r, green=>$g, blue=>$b);
}
else {
$class->next::method(@args);
}
}
...
}
...or else via some third-party MD solution (such as Multi::Dispatch
):
class Widget {
use Multi::Dispatch;
multimethod new :common ($title, $rgb) {
$class->next::method(title=>$title, red=>$rgb, green=>$rgb, blue=>$rgb);
}
multimethod new :common ($title, $r, $g, $b) {
$class->next::method(title=>$title, red=>$r, green=>$g, blue=>$b);
}
multimethod new :common (%args) {
$class->next::method(%args);
}
...
}
If we decide that the NEWARGS approach is simultaneously both too complex and too limited in scope, then we should just shelve the entire idea of pre-tweaking named constructor arguments...at least for the time being.
I'm actually OK with that and saying it's out-of-scope for the MVP. I don't particularly see a problem with saying, "we have a consistent syntax and you have to use it for now." I'd rather we see how simple we can make things for the MVP and verify that the overall thing works. Once we release, we can see how it plays out in the real world. (though this means people are going to write method create :common (...) {...}
to achieve this).
I suspect that this might be a minority opinion.
+1 from me Better to have no API (for now) than one we want/need to deprecate later.
The NEWARGS
approach enables a case I have occasionally: A subclass with a restricted constructor. I'll take @thoughtstream's solution to my Real
example, but drop the multi dispatch feature:
class Complex {
field $re :param :reader;
field $im :param :reader;
}
class Real :isa(Complex) {
NEWARGS (%args) { die if exists $args{im}; (%args, im=>0); }
}
This is nice, and the only downside is that there's no declaration to show that im
is not available as a constructor attribute.
A "loss" of constructor param declaration in the %arg
soup also happens if the fields in question are provided by a base class or role. I'm modifying the widget example to add a role:
role Appearance {
field $colour :param;
}
class Widget :does(Appearance) {
field $title :param;
NEWARGS (%args) {
(title => $args{title},
colour => Colour->new(@args{qw(red green blue)}),
)
}
}
my $widget = Widget->new(title => "The text", red => 100, green => 30, blue => 0 );
This works, but as in @leonerd's initial example:
Most notably, the words
red
,green
andblue
only appear in the behavioural code
...only this time it is in NEWARGS
, not in ADJUST
.
The obvious solution - Appearance
could accept red
, green
and blue
- might not always be available.
@HaraldJoerg observed:
the only downside is that there's no declaration to show that
im
is not available as a constructor attribute.
Setting aside the crimes against Liskov Substitutability (;-), a sufficiently advanced multiple dispatch technology would allow this constraint on the derived class constructor to be expressed declaratively. For example:
use Multi::Dispatch;
class Complex {
field $re :param :reader;
field $im :param :reader;
}
class Real :isa(Complex) {
multimethod new (%{ re => $re }) { # Named args, but only allowed to have 're' key
$class->next::method(re=>$re, im=>0);
}
}
A "loss" of constructor param declaration in the %arg soup also happens if the fields in question are provided by a base class or role.
Again, an S.A.M.D.T. can restrict named argument lists to specific pre-declared keys:
role Appearance {
field $colour :param;
}
class Widget :does(Appearance) {
field $title :param;
multimethod new (%{ title=>$t, red=>$r, green=>$g, blue=>$b}) {
$class->next::method( title => $t, colour => Colour->new($r, $g, $) );
}
multimethod new (%{ title=>$t, color=>$c }) {
$class->next::method( title => $t, colour => $c );
}
}
my $widget = Widget->new(title => "The text", red => 100, green => 30, blue => 0 );
As an aside, I see an issue with this:
class Widget {
multimethod new (%{ title=>$t, red=>$r, green=>$g, blue=>$b}) { ... }
multimethod new (%{ title=>$t, color=>$c }) { ... }
}
In particular, maintainability. There's nothing to stop someone from writing this:
class Widget {
multimethod new (%{ title=>$t, red=>$r, green=>$g, blue=>$b}) { ... }
# thousands of lines of code
multimethod new (%{ title=>$t, color=>$c }) { ... }
}
And then the poor maintenance programmer (often me, damn it), is unaware that there's something else they have to think about when maintaining the code.
There's an old joke about doctors where the patient complains that "it hurts when I do X" and the doctor replies "then stop doing X."
The Perl developer response to my concern is typically "then don't do X." There's a certain merit to this. class Person :isa(Invoice)
is a big bucket of "no," but the computer can't know the meaning of what I'm writing, so we're told "don't do X." But we have unclear specs. We have organically growing code. We have deadlines. We make mistakes. I tend to get hired to work on large systems and they're always a mess—always—even when it's clear that the original code was solid.
When the software can protect us, it should. In this case, we have things that are semantically related, but that's not expressed in the code. So instead, I've long thought I'd prefer something like this (abominable) syntax:
class Widget {
multimethod new {
args (title=>$t, red=>$r, green=>$g, blue=>$b) { ... }
args (title=>$t, color=>$c) { ... }
args ($title) { ... }
}
}
With the above, developers wouldn't be grouping multis by happenstance. They'd be doing so because the syntax forces them to be more organized, making the code easier for others to develop.
class Person :isa(Invoice)
is a big bucket of "no," but the computer can't know the meaning of what I'm writing
Hmmmmmmmmmmmmm.
Given the astonishing advances we’re currently seeing in machine learning (e.g. plausibly relevant synthetic images generated from pure text descriptions), I wonder how hard it would actually be to create a module that points out these types of code malapropisms??? One might imagine:
use reasonable 'semantics';
class Car :isa(Wheel) {...}
# Compile-time error: Category mistake (a car is not a wheel)
class Real :isa(Complex) {...}
class Square :isa(Rectangle) {...}
# Compile-time error: Liskov violations
class Utils :does(Maths) :does(Stats) :does(Transforms) {...}
# Compile-time warning: Possible "God object" detected
Hmmmmmmmmmmmm.
When the software can protect us, it should. ... So instead, I've long thought I'd prefer something like this (abominable) syntax:
That’s extremely interesting. I don’t recall having seen that particular approach to multiple dispatch in any programming language I’ve studied.
However, although I certainly agree this kind of restriction would be much kinder to hard-pressed maintenance personnel, it would also undermine one of the great advantages of multiple dispatch: the clean mechanism MD provides for extending the behaviour of an existing multi to cover new cases as they are added.
For example, imagine a multi-based implementation of some Data::Dumper-like facility:
multi dd (\@a) { '['.join(',', map {dd($_)} @a) . ']' }
multi dd (\%h) { '{'.join(',', map {dd($_).' => '.dd($h{$_})} keys %h).'}'}
multi dd (\$s) { '\(' . dd($$s) . ')' }
multi dd (Num $n) { $n }
multi dd (Str $s) { '"' . quotemeta($s) . '"' }
When someone creates a new class, and would like the dd
multi
to subsequently handle objects of that class as well, they can simply write
(in their own module):
class MyClass {
method serialize () {...}
multi dd :export (MyClass $obj) { $obj->serialize }
}
And now the client code’s dd
understands those kinds of objects as well.
More relevantly, however, is the fact that, just like regular methods, multimethods are inherited within class hierarchies, and composed via roles. So you inevitably get fragmentation of your multimethod anyway:
class MyBase {
multimethod do_something ($x, $y) {...}
}
# thousands of lines of code
# or in a completely separate source file
role Something {
multimethod do_something ($z) {...}
}
# thousands more lines of code
# or in another completely separate source file
class MyDer :isa(MyBase) :does(Something) {
multimethod do_something ($x, $y, $z) {...}
}
This has to be supported, no matter what, so if you attempted to impose contiguity on variant declarations via syntax, I suspect developers would simply take to splitting their scattered multimethod declarations out into scattered roles (just to spite you ;-)
Nevertheless, I can see the desirablity of encouraging developers to “cuddle” their multimethod variants. So (in Multi::Dispatch at least) I now plan to offer an loading option that complains at compile-time when it finds non-contiguous variant declarations within a single namespace.
Thank-you for the idea.
[Obligitory "We're straying from the topic again" reminder...ironically, by the main offender]
In that case, it seems we're down to three options (someone correct me if I'm wrong)
ADJUST
/NEWARGS
I don't think we're getting multidispatch soon, so scratch that. The "do nothing" raises the question: does this create an unsolveable problem? I'm not convinced that it does, and for some workarounds, how likely are they to be in practice?
That being said, "do nothing", while sounding tempting, doesn't seem like the best approach to me because at the very least, we need ADJUST
and if we implement just that, we might find it not-forward compatible with our needs. However, this means adding arguments to phasers, along with adding an :adjust
attribute to fields. @leonerd: thoughts?
I have been musing for some time now whether it is possible to merge the :param
and :adjust
attribute into one.
Rationale: Both attributes describe part of how a field gets its initial value. :param
states that a value can be provided by the caller of the constructor, and we already have :param(NAME)
if we want different names for the field and parameter. So maybe we could extend :param
to carry the declaration of the parameters which are passed to the initializer block?
Here are some examples:
field $name :param;
field $name :param(name); # same behavior as previous line
The caller must pass a value for name => $scalar
to the constructor (unchanged to the current RFC). The key "name" is marked as used by the constructor.
field $name :param(label)
The caller must pass a value for label => $scalar
to the constructor (also unchanged). The key "label" is marked as used by the constructor.
Now let's add initializer blocks:
field $name :param { $name // 'N.N.' } # default is 'N.N.'
field $name :param { $name } # defaults to undef
field $name :param(label) { $label } # label is optional, defaults to undef
This is a change to the current spec (and to the behavior of Object::Pad): The caller may pass a value name => $scalar
in the parameter list. The parameter name
is available as a scalar $name
in the initializer block. The initializer block, if present, is always executed (unlike Object::Pad does today), the value of its last statement is the initial value of the field.
So one example we had before could be written like this:
field $title :param(title,label) { $title // ucfirst lc $label }
Both title
and label
are marked as used by the constructor.
The widget example:
class Widget {
field $title :param;
field $colour :param(red,green,blue) { Colour->new( $red,$greem.$blue ) };
}
my $widget = Widget->new( title => "The text", red => 100, green => 30, blue => 0 );
Like in @leonerd's initial example, it is not possible to specify a Colour
object directly. You need NEWARGS
wizardry if you want to allow both types of invocation.
The fact that initializer blocks are always run has some interesting consequences. I guess I myself will make that error sooner or later:
class Breakfast {
has $drink :param { Coffee->new; }
}
This is no longer specifying a default. Instead, it ignores whatever parameter has been given to the constructor.
As another example, array fields can be initialized from constructor parameters like this:
class Polygon {
field @points :param { @$points }
}
Also, initializer blocks can now be used to validate the input, which is kinda nice because it happens early in the construction process and in many cases there will be a 1:1 relationship between a param and a field.
A construction parameter can be used to build more than one field:
class Prism {
field $height :param { $height // 1.0 }
field $size :param { $size // 1.0 }
field @vertices :param(sides,size,height) { ...; }
field @faces :param(sides) { ...; }
}
my $p = Prism->new(sides => 6) # create a hexagonal prism
There's still a need for ADJUST
phasers in case the fields can not be initialized one-by-one, or when constructors should have side-effects (e.g. logging), or when validating "the whole thing" is necessary.
I think @HaraldJoerg’s “initializer-blocks-always-run” proposal
has some good points, but I also think it has some issues
(even apart from his own has $drink :param { Coffee->new; }
counterexample).
Most importantly, the current behaviour of initializer blocks
(even without my proposed :adjust($init_param)
addition)
provides a critical feature: they specify a default initialization
behaviour when the corresponding named constructor arg (“NCA”)
is missing.
For example:
field $colour :param :adjust($r=0.5, $g=0.5, $b=0.5) { Colour->new(r=>$r, g=>$g, b=>$b) }
...allows users to either pass a colour => $colourobj
argument to the
constructor, or else pass r=>$red, g=>$green, b=>$blue
instead,
or else pass nothing and get grey as a default.
I can’t really see how to achieve that under @HaraldJoerg’s proposal.
And here's an even simpler example:
# Initialize from the 'name' NCA, or else from the 'label' NCA...
field $name :param :adjust($label) { $label }
Moreover, this example tells the compiler to mark the 'param'
NCA as consumed
if it was present, in which case the additional presence of a 'label'
NCA
is automatically reported as an error. And if the 'param'
NCA is missing,
then that's a (different) error, unless the 'label'
NCA is present. And if
they're both missing, then that can be reported accurately too ("...was expecting
'name' or 'label' argument..."). I can’t see how to achieve all that under
@HaraldJoerg’s proposed unified semantics.
The issue here is that :param
means: “expect an NCA with the specified name”,
whereas :adjust
means “if you didn’t get the expected NCA, do this instead
with this other specified NCA”. Those two behaviours are intrinsically sequential,
and the second is contingent on the first. Hence they are not really unifiable.
I would also point out that my previous proposal already allows for the
kinds of “always executed” initializer blocks @HaraldJoerg is proposing.
You just leave out the :param
specifier, in which case the initializer
block is always executed. Here are all his examples, under my previous proposal:
field $name :param { 'N.N.' } # default is 'N.N.'
field $name :param { undef } # defaults to undef
field $name :param(label) { undef } # label is optional, defaults to undef
# If no 'title' NCA, use 'label' NCA instead...
field $title :param :adjust($label) { ucfirst lc $label }
class Widget {
field $title :param;
# Must pass 'red', 'green', and 'blue' NCAs; can't pass 'colour' NCA...
field $colour :adjust(red,green,blue) { Colour->new($red,$green,$blue) };
}
class Breakfast {
has $drink :param { Coffee->new; } # Now not an error (Coffee is the default)
# or...
has $drink { Coffee->new; } # Also not an error (Coffee is mandatory...and obviously so)
}
class Polygon {
field @points :adjust($points) { @$points }
}
class Prism {
field $height :param { 1.0 }
field $size :param { 1.0 }
field @vertices :param :adjust(sides) { ...; }
field @faces :param :adjust(sides) { ...; }
}
my $p = Prism->new(sides => 6) # create a hexagonal prism
In summary: I completely understand (and approve of) the desire to reduce new constructs wherever possible, especially by combining them when they turn out to actually be just two “views” of a single underlying concept. And I'm very glad that smart and dedicated people like @HaraldJoerg are thinking in those terms about this design.
However, I don't believe that explicit initialization (:param
) and
fallback default initialization (:adjust
) are merely two aspects
of one thing. They are two distinct and contingent steps in the overall
initialization process...and I believe they need two distinct and contingent
constructs to specify them.
I specifically chose the name :adjust
to imply that the second phase
happens after normal initialization (just like with ADJUST
phasers),
but perhaps another name would make it clearer that such adjustments
only occur when :param
initialization is absent...or fails. Perhaps :default
?
(I'm less keen on :fallback
, but I guess that would work too.)
@thoughtstream says:
field $colour :param :adjust($r=0.5, $g=0.5, $b=0.5) { Colour->new(r=>$r, g=>$g, b=>$b) }
...allows users to either pass a
colour => $colourobj
argument to the constructor, or else passr=>$red, g=>$green, b=>$blue
instead, or else pass nothing and get grey as a default.I can’t really see how to achieve that under @HaraldJoerg’s proposal.
I omitted that intentionally (mumbling about NEWARGS
). Actually, this is possible in an initialiser block:
field $colour :param(colour,r=0.5,g=0.5,b=0.5) { $colour // Colour->new(r=>$r, g=>$g, b=>$b) }
# or (as attributes are a pain to parse)
field $colour :param(colour,r,g,b) { $colour // Colour->new(r=>$r//0.5, g=>$g//0.5, b=>$b//0.5) }
I can narrow down the difference in interpretation with @thoughtstream's quote:
The issue here is that
:param
means: “expect an NCA with the specified name”,
In my interpretation, :param
means "The initial value for this field is to be derived from the NCAs given in parens". It does not need a distinction between NCAs specified in :param
and NCAs specified in :adjust
. After all, :adjust
also means "expect an NCA with the specified name" in this example:
field @points :adjust($points) { @$points }
field $colour :param :adjust($r=0.5, $g=0.5, $b=0.5) { Colour->new(r=>$r, g=>$g, b=>$b) }
I can’t really see how to achieve that under @HaraldJoerg’s proposal.
field $colour :param(colour,r=0.5,g=0.5,b=0.5) { $colour // Colour->new(r=>$r, g=>$g, b=>$b) }
But this doesn’t fully achieve the same effect. For a start, unlike my example, it doesn’t specify
that the 'param'
NCA and the 'r'
/'g'
/'b'
NCAs are mutually exclusive, so the compiler
(or an IDE, or a static analyser, or a linter, or a refactorer, etc.) would have no way to
automatically detect or report the error in: Widget->new(colour=>$c, r=>$red, g=>$green, b=>$blue)
So then we’d start seeing:
field $colour :param(colour,r=0.5,g=0.5,b=0.5) {
die "Can't use 'colour' and 'r'/'g'/'b' at the same time"
if defined $colour && grep {defined}, $r, $g, $b);
$colour // Colour->new(r=>$r, g=>$g, b=>$b)
}
More generally, the behaviour of the :adjust
version:
field $colour :param :adjust($r=0.5, $g=0.5, $b=0.5) { Colour->new(r=>$r, g=>$g, b=>$b) }
is maximally declarative. The only code required is the code that provides the actual default/fallback value. Whereas, the behaviour of:
field $colour :param(colour,r=0.5,g=0.5,b=0.5) { $colour // Colour->new(r=>$r, g=>$g, b=>$b) }
is emergent. The default/fallback only occurs because of the //
in the code.
Which means it’s much easier to enbug that version:
field $colour :param(colour,r=0.5,g=0.5,b=0.5) { $colour || Colour->new(r=>$r, g=>$g, b=>$b) }
The same issues apply even more to the alternative formulation @HaraldJoerg suggested:
field $colour :param(colour,r,g,b) { $colour // Colour->new(r=>$r//0.5, g=>$g//0.5, b=>$b//0.5) }
This has all the same non-declarative disadvantages as the previous version but, in addition,
now the compiler (or IDE, or static analyser, or linter, or refactorer, or a human reader of the code)
can’t even determine syntactically that the 'r'
/'g'
/'b'
NCAs are optional or what
their individual fallback values are.
And the possibility for unintended effects is even greater:
field $colour :param(colour,r,g,b) { $colour || Colour->new(r=>$r//0.5, g=>$g//0,5, b=>$b/0.5) }
In my interpretation,
:param means
"The initial value for this field is to be derived from the NCAs given in parens". It does not need a distinction between NCAs specified in :param and NCAs specified in :adjust.
I’m arguing that we do need that distinction. In order to generate correct error messages. And, more importantly, in order to avoid the kind of emergent (mis)behaviour that this declarative OO mechanism is supposed to be supplanting.
I'm starting to think we're looking at putting WAY too much into a :param attribute, especially for something that needs an MVP implementation. What started as a simple "put this NCA into a field for me" is looking like it's being slowly turned into a field-by-field subroutine signature.
I'm going to once more reiterate my own proposal for this solution space.
field
marked with :param
(or :param(NAME)
to have the NCA named different from the field). No two fields at any point in construction may examine the same NCA (adjudicated per hash-key lookup).:param
fields with an initializer block produce an optional NCA. If the NCA is present, the initializer block is skipped. If a :param
field appears and the NCA is not present, a fatal error is produced at that point and no further initializer or ADJUST blocks run.$self->Base::new(...)
or $self->Role::new(...)
to initialize base classes and roles.@_
, not via %_
or some magic lexical, none of that. If you want to look at NCAs, you look at previously declared field
s marked with :param
.field
can be marked :param
with no name permitted. If this is done, the hash receives and consumes all NCAs not yet consumed by any prior field
. Since processing happens in lexical order, no field
declared after this hash field could usefully use :param
. If no hash field is marked :param
, any unused NCA will produce a fatal error no later than after the last field
with :param
is processed or the last ADJUST block runs.field
not used in any method will be freed (and thus any reference it contains released) before new
returns to its caller. This may require multiple passes to prune out fields used in private methods that are themselves only used during construction.The idea here is to achieve declarative NCAs yet keep things simple enough to be possible for an MVP.
Since our colour example keeps coming around, that results in:
field $r :param;
field $g :param;
field $b :param;
field $colour :param { Colour->new(r => $r, g => $g, b => $b) };
Item 3 has the goal of allowing derived classes to have absolute control over the initialization of their superclasses. This was also the motivation behind item 5 (to allow the derived class to delegate unknown NCAs to base class). If a base class or role isn't given a ->new() call by the subclass, the overall construction logic should automatically construct the base and roles as if they had no parameters.
(Just to re-iterae something: the "fields not used beyond construction can be freed" would extend to ANY field, :param
or not. Among other things, this means something like this is not as wasteful as it looks:
field $with_rgb { 1 };
field $r :param { $with_rgb = undef; };
field $g :param { $with_rgb = undef; };
field $b :param { $with_rgb = undef; };
field $with_colour { 1 };
filed $colour :param { $with_colour = undef; $with_rgb or die; Colour->new(r => $r, g => $g, b => $b); };
ADJUST { !$with_rgb != !$with_colour or die; }
)
I agree with @aquanight that my :adjust
proposal and @HaraldJoerg's extended :param
proposal
are both inappropriate in the MVP (let alone in the MMVP).
I'd be fine if the first release of Corinna has only simple :param
and unparameterized initializer blocks.
For the MMVP, it doesn't matter that we can't (yet) specify alternative initialization NCAs for a field.
But this discussion is also about what Corinna should provide long-term, as it evolves. It's critical to work though such matters now, so that we don't make decisions in the MMVP that paint us into a suboptimal corner further down the line.
That said, if you truly believe that:
field $with_rgb { 1 };
field $r :param { $with_rgb = undef; };
field $g :param { $with_rgb = undef; };
field $b :param { $with_rgb = undef; };
field $with_colour { 1 };
field $colour :param { $with_colour = undef; $with_rgb or die; Colour->new(r => $r, g => $g, b => $b); };
ADJUST { !$with_rgb != !$with_colour or die; }
...is an adequate long-term alternative to something like:
field $colour :param :adjust($r, $g, $b) { Colour->new(r => $r, g => $g, b => $b) };
...or perhaps:
NEWARGS ($r, $g, $b) { colour => Colour->new(r => $r, g => $g, b => $b) };
field $colour :param;
...then I suspect our criteria for programming language design are just too fundamentally different for us ever to agree.
Mind you, I'm not saying which of us is right; just that one design approach must die at the hand of the other...for neither can live while the other survives. ;-)
Current state in Object::Pad
First, a little background on the current state of
Object::Pad
. There areADJUST
blocks, and there are field-initialiser blocks. They behave in very different ways.The
ADJUST
keyword creates an entire (anonymous) function which behaves as such when invoked. It is passed a hashref to the constructor params as its first argument. It can access that with$_[0]
orshift
or optionally by providing a signatured parameter name:(Of course, these examples are stupid in that they're doing things that
:param
would be far better suited to, but I didn't want to cloud up the examples with more complex real-world examples.)By comparison, field-initialiser blocks are simple blocks, being parsed by no small amount of parser trickery into believing they're all just sequential blocks of the same function. That function is a single (anonymous) function stored as part of the class, used to initialise all the fields. These initialiser blocks do not currently have any access to the constructor parameters; which is somewhat limiting.
Both ADJUST and field-initialiser blocks get access to a
$self
lexical, the same as other methods.Current state in feature-class branch
The current
feature-class
branch providesADJUST
blocks that allow either of the first two forms, but not the third form (with a signature parameter). The branch does not currently provide field-initialiser blocks at all.The Question
And now we get on to my question. I would like to unify these two things together and at the same time make them better. In particular, I would like both ADJUST and field-initialiser to be simple blocks within one (implied) function rather than having one entersub overhead for every block. Additionally, I would like all of them to have the same access to the constructor parameters.
In particular, it would be great if we could say that a field-initialiser block is really just the same as an ADJUST block that assigns the field from its result, give or take some code that inspects the parameters hash to see if the caller already provided a value.
The trouble though is how to provide this params hash(ref). It can't just come as the first value in
@_
because any block would too easily be able to break access to it for all the later ones by just doingshift
, or something equally dumb. It needs to be provided by something guaranteed to be visible to all the blocks, that no earlier block can get in the way of. So far I can only think of two ideas, neither of them are great:Special
%params
lexicalIn the same way that
$self
is special, make%params
special in these blocks:It's a simple idea and easily understandable, but it does mean that no class is permitted a field called
%params
itself, because then there'd be no way for an ADJUST block to see it. Of course we're already in that situation at the moment with$self
, so it doesn't make the problem much worse. But perhaps enough classes might themselves want afield %params
, that this would become awkward.Use the
%_
superglobalBy considering analogy to the
@_
superglobal already in perl, we can consider using the%_
hash as a storage of these name/value pairs:I think there's a certain neatness to this, and a certain symmetry with using
$_[0]
,$_[1]
, etc... However, some folks might object to it on grounds that "newbies to perl might be confused by lots of symbol syntax". Personally I'm not very swayed by this particular argument, but it seems important to some. Another potential argument against doing this is that it seems weird to introduce a new use for%_
while at the same time trying to get rid of@_
in favour of function signatures.Other Designs
Another way to look at it entirely, is to observe that the main reason for wanting the hash(ref) of params available in these blocks in the first place, was to do things with constructor parameters that aren't just simply "copy the value into a field". This was originally discussed for
Object::Pad
in RT137209. I say "discussed" - I thought out loud on a few ideas but nobody else has commented so far.In many ways the current solution of just passing a hash(ref) around isn't very nice. It means that the actual parameter names used by the construction process overall are not manifestly expressed anywhere, and only become apparent in side-effects of the actual process of constructing an object. You can only find out those names by running it and seeing if it complains about missing or extra ones. It would be nice to find a nicer overall design for this sort of pattern, but so far one has not emerged.