A proposed API to interact with fields inside instances

leonerd commented 2 years ago

While respecting that by default, fields of classes are in some sense "private" to that particular class, it is nevertheless inevitable that special-purpose pieces of perl code will want to "gut-wrench", and reach inside of instances to inspect or manipulate the data in the fields contained therein. In particular, see #62.

The moment we add some sort of MOP API (e.g. see the API shape suggested by Object::Pad::MOP::Class et.al.) it becomes possible to write functions to do justabout all of this. Therefore it makes sense to think about a standard set to be provided upfront.

Here is my suggestion:

A symmetric pair of functions that explode a given object instance into some lower-level representation of its fields, and one that reconstructs a new object based on that representation:

@repr = builtin::explode_object($obj);

$obj = builtin::make_object(@repr);

Here, $obj contains an object instance, and the @repr list contains some representation of the fields and the values they contain. I don't yet have a firm feel for exactly what shape that should be, but it should primarily be composed of plain strings, and plain scalars directly taken from fields. It might additionally contain extra structure in terms of hash or array references.

I have various thoughts on how that ought to look, but I'll expand on that in later messages. Point being: it should be possible to recuse down that to find more simple things.

leonerd commented 2 years ago

Thinking ahead for having multiple component classes, roles, etc... I suspect a two-level structure for @repr probably makes sense. It wants to be a key/value list, pairing up names of component classes with a structure (hash? array?) containing the names/values of the fields of that component. For example in a simple flat case:

class Point {
  field $x :param;
  field $y :param;
}

explode_object(Point->new( x => 10, y => 20 ))

might yield the values:

"Point" => { '$x' => 10, '$y' => 20 }

A more complex nested one might be more like:

role Displayable {
  field $display; ...
}
class Point3D :isa(Point) :does(Displayable) {
  field $z;
}

explode_object(Point3D->new( ... ))

"Point3D" => { '$z' => 30 }, "Displayable" => { '$display' => ... }, "Point" => { '$x' => 10, '$y' => 20 }

This puts the actual class as the first item, with its roles following it, then the parent class (recursively) at the end. Putting the actual class first in the list makes it easy for make_object() to find it when reconstructing the object.

The fields of each component class / role are separated by being in sub-hashes, so that duplicated names between them don't matter - it's fine for roles to have duplicate field names as the classes they're mixed into, and for classes to duplicate field names of their parents.

In the case of non-scalar fields, the value stored in the hash would be a reference:

class List {
  field @items;
}

"List", { '@items' => [1, 2, 3, 4, ...] }

Alternatives

Possible alternative ideas could be to keep all the fields in one large flat list, with some sort of prefix giving the partial class name to account for duplicates:

'Point3D/$z' => 30, 'Displayable/$display' => ..., 'Point/$x' => 10, 'Point/$y' => 20

This might make it nicer to work with, because it's just a flat kvlist suitable for assigning into a single hash, but it does then need some thought about what sort of separator character is used; for example, a / in this example.

leonerd commented 2 years ago

Further API

The following become possible to think about as well:

getfield

Based on explode_object, it's tempting to consider a function that can retrieve the value of an individual field out of an instance:

$value = getfield($obj, $fieldname);

# e.g.
say "The point is at X coördinate ", getfield($point, '$x');

Where $fieldname is just a string literal giving the name of the field. Though again, for accessing fields of parent classes/roles, it might be helpful to have the class name as well, or consider the joined up string form given in "alternatives", though again it needs thought about the separator character in that case.

getfield($point3d, Point => '$x');
getfield($point3d, 'Point/$x');

What should it do for non-scalar fields though? Perhaps in list context, return all the values individually. In scalar? I'd vote best to behave like a lexical would in that same scalar context - arrays yield their element count, hashes yield their key count.

setfield

Once you have a get function, it becomes even more tempting to consider a set one, at least for scalar fields.

setfield($obj, $fieldname, $new_value);

(but now how do we handle the class name? Still to be thought up)

This isn't going to be very nice for array/hash fields, but I think something else might:

reffield

A function to obtain a reference to an instance field, as if returned by code like return \%hashfield:

$ref = reffield($obj, $fieldname)

With arguments the same as getfield. In fact, getfield and setfield can both be implemented in these terms, at least for scalar fields:

sub getfield($obj, $classname, $fieldname) { return ${ reffield($obj, $classname, $fieldname ) }; }
sub setfield($obj, $classname, $fieldname, $newval ) { ${ reffield($obj, $classname, $fieldname } = $newval; }

It then makes it possible to perform any other operation on array/hash fields:

push @{ reffield( $obj, $classname, $fieldname ) }, @more;

keys %{ reffield( $obj, $classname, $fieldname ) }

# etc...

leonerd commented 2 years ago

It has been suggested that "explode" is probably not a good name here. Aside from the allusions to violence, it also suggests a fully-recursive breaking apart of the object right down to its scalar pieces; which isn't what happens here. It only picks apart one layer - if there are further object refs or other containers within that, those are preserved.

A better naming idea likely exists. Ideally something that can pair nicely with the object reconstruction function, to really hammer home the "symmetric pair" nature of them.

leonerd commented 2 years ago

@repr = deconstruct_object $obj;
$obj = construct_object @repr;

I like that.

Ovid commented 2 years ago

Why would we need construct_object? Is this for freezing/thawing?

Also, the get_field and set_field would be part of of a MOP, right? If so, and if your proposed functions were needed, they should be using the MOP so we can have a single source of truth for this.

Ovid commented 2 years ago

Also:

Though again, for accessing fields of parent classes/roles, it might be helpful to have the class name as well ...

If builtin::reftype $some_instance return OBJECT, would it make sense to provide my $class_name = isa $some_instance? (That would make it a prefix operator with the instance on the RHS instead of the left, so I admit it feels weird).

leonerd commented 2 years ago

@Ovid Quick reminder that reftype isn't ref. reftype gives you a string that's one of a few standard fixed ones, to name the basic container type of a referrent. ref gives you the full blessed class name.

my $hashpoint = bless {}, "Point";
ref $hashpoint eq "Point";
reftype $hashpoint eq "HASH";

my $objpoint = Point->new;
ref $objpoint eq "Point";
reftype $objpoint eq "OBJECT"

The ref function already does what you wanted.

Edit or blessed actually, too.

leonerd commented 2 years ago

@Ovid

Why would we need construct_object? Is this for freezing/thawing?

Yeah, basically just for weird cases like Storable, Sereal, etc... It probably shouldn't be used much besides those.

Though it is handy to at least have such a function available, so that Data::Dumper can rely on it for printing.

class WithHiddenFields {
   field $x = 123;
   field $y = 456;
}
say Dumper(WithHiddenFields->new());

could print

$VAR1 = builtin::construct_object(WithHiddenFields => {'$x' => 123, '$y' => 456});

Ovid commented 2 years ago

Thanks for the clarifications!

How would deconstruct_object handle this?


my $thing = ... ;
class MyThing {
    use DBI;
    field $dbh { DBI->connect(...) };

    method foo ($bar) {
        if ( $bar > $thing ) { ... }
    }
}

I've seen cases where code's blown up because a frozen scope has closed over a variable outside of that scope and since it's not declared in the scope, thawing it blows up with a Global symbol $thing requires explicit package name.

I'd rather have the code blow up on deconstruct_object instead of construct_object. If deconstruct_object holds code and doesn't freeze it, it may be safe to do that, but I don't know what the plans are. I do, however, like the direction this is going.

Also, it might be nice to have a :restricted attribute:

class CreditCard {
    field $name   :param;
    field $type.  :param;
    field $number :param;
    field $cvv2   :param :restricted;
}

The above example reflects the fact that PCI forbids us from recording the CVV2 number. If we are caught doing that, we face massive fines and possibly lose the right to process credit cards.

A :restricted field would be high-security and not available via deconstruction, or for Data::Dumper and friends, but it might be available via the MOP. This is to prevent accidental information leaks (like writing passwords to logs). If we like this, we've had to figure out how that works in the face of a :reader attribute, because when I see information like this, it usually shouldn't be available outside the class, either.

haarg commented 2 years ago

$thing would be a global. It would work just like any other global. The same as any "class field". It would maintain whatever value it had and have no impact on either construct_object or deconstruct_object.

Ovid commented 2 years ago

As mentioned here, I would much prefer if deconstruct_object either mark the data read-only or clone it (or COW) to avoid inevitable bugs when the internal state of the object is changed in an unexpected way.

leonerd commented 2 years ago

@Ovid

As metioned here, I would much prefer if deconstruct_object either mark the data read-only or clone it (or COW)

A useful-sounding idea, but a bit tricky to arrange in practice. It can't mark the data read-only because that would make the field itself inside the object read-only. We don't (yet) have COW arrays or hashes in Perl.

We could clone the values, but that would make it run somewhat slowly. Perhaps not too bad for human-readable debug printing, but things like fast network/disk access serialisers might get upset by that. Maybe we'd give them another function to say "hey we'll give you a reference to the real data so you'd best promise not to edit it".

Another thought would be to add COW arrays/hashes into Perl. I'm sure this isn't the first situation where this problem has come up; and such a mechanism might be a handy way forward for all of them.

Safest I think would be to make copies at first, and if someone finds it to be too slow for serialisation purposes, we can revisit the idea and either provide another function, or look into COW arrays/hashes.

Ovid commented 2 years ago

Safest I think would be to make copies at first, and if someone finds it to be too slow for serialisation purposes, we can revisit the idea and either provide another function, or look into COW arrays/hashes.

I like that. It's safe. What I think would be good is for someone to write a role that objects can consume that allows the object to dictate how it's to be serialized/deserialized. The object could do this faster and safer.

duncand commented 2 years ago

@repr = deconstruct_object $obj;
$obj = construct_object @repr;

I like that.

I feel that it would be best to be more descriptive, that these function pairs specify in their names WHAT format they are mapping objects to/from. There should be some kind of formal specification that names and describes the format. For example, Data::Dumper and other similar tools each have a defined format presumably.

As it happens, with https://github.com/muldis/Muldis_Object_Notation/blob/master/spec/Muldis_Object_Notation.md I am in the process of formally defining a format, "Muldis Object Notation Syntax Perl", which is expressly designed for the use case you are talking about. In the general case, an object of a Perl class corresponds to an "Article" which is formally defined in terms of plain Perl arrays/hashes/scalars/etc. (While I haven't yet fully defined the Perl-specific representation, the above url has a placeholder for it and does define other abstract and concrete representations it would map with.)

duncand commented 2 years ago

As it happens, with https://github.com/muldis/Muldis_Object_Notation/blob/master/spec/Muldis_Object_Notation.md I am in the process of formally defining a format, "Muldis Object Notation Syntax Perl", which is expressly designed for the use case you are talking about. In the general case, an object of a Perl class corresponds to an "Article" which is formally defined in terms of plain Perl arrays/hashes/scalars/etc. (While I haven't yet fully defined the Perl-specific representation, the above url has a placeholder for it and does define other abstract and concrete representations it would map with.)

FYI, since I wrote the above, I just went and fleshed out or completed defining the Perl-specific representation of Muldis Object Notation, so re-visiting the above url now, you can see it. I likewise completed the Raku-specific version, so now the spec shows 5 fully fleshed out counterparts, for Perl, Raku, Java, .NET, plus the canonical plain-text representation which is loosely like JSON but with stronger typing etc, and the 5 can be compared side by side with some examples.

Note that if the format of MUON may remind you of the format that modules like SQL::Abstract/DBIx::Abstract/DBIx::Class/etc use to represent database data or SQL queries as Perl data structures, that is indeed the case, such as those were a primary influence.

thoughtstream commented 2 years ago

Just a small linguistic point...

I'd much prefer:

    @rep = deconstruct_object($obj);
    $obj = reconstruct_object(@rep);

Apart from the greater symmetry of "deconstruct..." vs "reconstruct...", I think it would be better not to use a bare "construct...", so as to avoid confusion with the actual construction process automagically provided by Corinna classes.

Specifically, I assume that construct_object() isn't actually constructing an object in the same way Classname->new(...) does. For example, it's presumably not going to run any ADJUST phasers, or any default field-value blocks, on the object it's rebuilding. It's just going to blindly reconstruct the internal state of an object from a previously deconstructed representation, right?

Assuming that's correct, I think reconstruct_object(...) is clearer in intent and effect than construct_object(...).

Perl-Apollo / Corinna