Perl-Apollo / Corinna

Corinna - Bring Modern OO to the Core of Perl
Artistic License 2.0
157 stars 19 forks source link

Disambiguate Corinna from other objects #32

Open Ovid opened 3 years ago

Ovid commented 3 years ago

Generally speaking, objects should be opaque to the outside world. However, there are times that we need to know what we're working with. For example, the debugger problem means that Data::Dumper and friends would need some way of realizing that "hey, I can get useful state information for this thing".

There's also the question of what ref and Scalar::Util::reftype will return.

maros commented 3 years ago

Data::Dumper has support for Freeze methods (see https://metacpan.org/pod/Data::Dumper#Configuration-Variables-or-Methods ), so does Storable (see https://metacpan.org/pod/Storable#STORABLE_freeze-obj,-cloning ). Cor classes could provide an overridable universal freeze method that would return a deep-cloned hashref that may be used for general purpose serialization.

This mechanism also might also be useful to non-Corinna classes - eg. for inside-out classes, or classes accessing c-data structures.

jjn1056 commented 3 years ago

This question might or might not overlap with what happens with Moo or Moose extend a Cor based class. Will it work at all? Will Moose be able to inflate a meta object for a Cor class? If so will Cor's methods and lexical slots be inspectable as methods and attributes

Ovid commented 3 years ago

This question might or might not overlap with what happens with Moo or Moose extend a Cor based class. Will it work at all? Will Moose be able to inflate a meta object for a Cor class?

That and related questions will be up to the maintainers of Moo and Moose when Corinna is available.

Will Cor's methods and lexical slots be inspectable as methods and attributes

Internal state needs to be exposed for debugging purposes, but not for programmatic purposes. That defeats the point of encapsulation. So we've been in discussion about making this human-readable, not not guaranteeing its format or that it can be trivially parsed. From what I understand, however, Paul Evans has talked with Data::Printer maintainers about possibly adding introspection for Object::Pad. However, all parts of the public interface should be inspectable.

duncand commented 3 years ago

There's also the question of what ref and Scalar::Util::reftype will return.

The most important thing is that these routines will return some new value that is unique to Corinna classes and has never been returned by those methods before for anything else. I think OBJECT was proposed in the past.

duncand commented 3 years ago

To go along with Corinna, Perl should globally (such as provided in UNIVERSAL provide a routine/operator/method with a name like as_preview_string or (nee as_debug_string or as_debug_dump) etc, pick a name, which generates a string or or other transparent structure that spells out the guts of its argument as is useful for debugging purposes. Perl would have a built-in implementation for everything in the language, both Corinna or not, that is useful. Developers could optionally override it or provide an alternative under a non-conflicting name for their own classes either Corinna or not. As this is specifically named as being for debugging, it is explicitly guaranteed to be NOT deterministic under any circumstances, whether between minor versions of Perl or even between one call and the next in the same program execution. Between its name and that, developers are NOT to count on being able to parse and use it programmatically in any reliable way besides as on the spot debugging info. This is likewise explicitly NOT meant to be used in hashing for determining if 2 object instances are "equal" or not; such is the domain of some other not-for-debugging thing fit for that purpose. What do you think?

duncand commented 3 years ago

On a tangent, with my own Muldis Data Language / Muldis Object Notation project, I have designed it such that implementations explicitly have 3 distinct methods for serializing values as character strings. Implementations can just so happen to make them produce identical output but they don't have to.

  1. The first serialization is intended to guarantee being deterministic and accurate enough to be used for universal identity; there is a 1:1 mapping between when "X = Y" (X is the "same" value as Y) and when their serializations by this method are the "same" character string. Used to produce hash keys for set types and such. I call this as_identity_string or such.

  2. The second serialization is intended to NOT guarantee any determinism and is intended strictly for debugging, such as when you have a GUI debugger and you hover over a value and it gives a preview of what it contains. I call this as_preview_string (nee as_debug_string) or such. My prior comment is talking about the same thing as this.

  3. The third serialization is in the middle and is intended for possibly user configurable pretty-printing of a value for export such as for saving to disk and reading later or in interchange or whatever. The first 2 are more for internal use by the language and the third is mainly what users would see as "the serialization". Values don't have to serialize to the same string deterministically but one should be able to deserialize that string to yield the same value one started with, so it still round-trips.

Bottom line is, in a serious language/serialization project, these are 3 very distinct use cases that should each in principle have their own serialization routines, that are free to but not required to return the same strings.

abraxxa commented 3 years ago

Sounds like Str and Gist in Raku to me and yes I love those!

duncand commented 3 years ago

Sounds like Str and Gist in Raku to me and yes I love those!

Yes, that's along the lines of what I had in mind. In particular Raku's gist() is like my as_preview_string() including that the string could actually leave out details and not have to be exhaustive, eg it could limit to a maximum of 1KB rather than producing 1GB just because the structure is that big. I'm not sure how gist() is implemented but the examples I've seen suggest it swings on the side of being too terse, eg may only output 50 bytes or the first 3 list items or something. But making ours non-deterministic we can tune those things over time and not worry about having to keep it the same once set.

wchristian commented 2 years ago

Between its name and that, developers are NOT to count on being able to parse and use it programmatically in any reliable way

Let me make sure that y'all are aware of the consequences this creates.

A big part of debugging in general, across almost all languages in existence, paradigms, platforms, is to be able to use automatically updating visual representations of data which can also be used to modify memory data in realtime, and being that data can be extremely large and deep hierarchically, the norm is to do this in trees, which are loaded on-demand and just-in-time with limits on how deeply trees are loaded, to avoid parsing a 2 gigabyte object dump on every step.

This is currently extremely viable in Perl as demonstrated below (there are more features and interactions not shown, but space constraints) and implemented in at least 3 editors to my knowledge and likely many more.

https://user-images.githubusercontent.com/175467/154935295-d8c8d318-10b5-42a1-8e4d-718eb326d97d.mp4

(i could've recorded the same thing in 10 different languages just with what i've installed on this machine)

In general the difference between debugging and programming is about the same as between scripting and programming, which is to say it is a difference even more vague, malleable and ill-defined than genders. In addition to that, a lot of debugging techniques involve programming on the debuggable during the debugging.

It is one choice to say "i am not interested, i will not use this", but it is another choice to say "i will take steps to ensure nobody else can use this". Especially so when the latter means that somebody who was previously able to fairly quickly analyze the control and data flow in a module they downloaded from CPAN may now be forced to spend considerable additional time because of somebody else's choice, or choose a different module from CPAN.

If you make the choice of "nobody else can use this", please be aware that that is actually the choice you are making, and all it entails.

duncand commented 2 years ago

@wchristian said:

Between its name and that, developers are NOT to count on being able to parse and use it programmatically in any reliable way

Let me make sure that y'all are aware of the consequences this creates. ...

@wchristian About your new response to something I wrote 5 months ago, I feel that you fundamentally misunderstand what I said, as your response is talking about completely different functionality that what I said which you quoted.

My comment was about generic holistic serializations of any typed Perl value TO A SINGLE CHARACTER STRING that would be best provided built-in to Perl itself that are standard for every type. I was naming 3 very distinct use cases for which one may serialize an object, each of which would have its own needs, and each of which the serialization logic could choose to behave differently to optimize for that case.

The 3 use cases are:

  1. A perfect unique identifier to distinguish a value from another, such that the serialized form has a 1:1 correspondence to the Perl value, it could be used for a hash key for example. For this every detail down to the whitespace matters.
  2. A serialization for interchange or storage/retrieval. This needs to preserve everything that is important about the value that can be used to reproduce it later, but exact whitespace details don't have to be the same, so the same value could serialize in for example different pretty-printed customizable ways but they produce the same logical value when read in a gain. This COULD be identical to number 1 but doesn't have to be.
  3. A gist for quick debugging. While this COULD have all the qualities of either number 1 or 2, it doesn't have to. In particular, if the original value includes gigabytes of data this gist could ignore most of it and just output a sample of 1KB or so. This is meant to perform very fast and have very little resource consumption.

The text of mine that you quoted is about number 3 above, which is why I say one can't count on being able to programatically parse it meaningfully to reproduce the original value, because that isn't what its for.

Given some typical GUI debugger when your program is paused and you are inspecting variables, if you hover over some variable and there is pop-up text showing its contents, that pop-up is likely to only show the first few hundred bytes, a preview, and not try to generate a gigabyte of text. This preview is what my number 3 is for.

What YOU seem to be talking about is NOT everything-into-a-string, instead you are examining values as structures and letting one look into the parts, it is NOT turning everything into a string.

So please clarify what the problem is that you seem to have with what I said, or if you no longer have a problem with it now that I've hopefully made more clear what I'm talking about.

wchristian commented 2 years ago

I came here because Ovid closed #30 and pointed at this issue. Your post seemed to be the only one with anything similar to a solution to the larger complex. If that's not intended, fine.

Nevertheless, i made a video. Corinna is steering towards making what it shows impossible, i.e. towards being useless with a debugger. If that is the decision Cor makes, it needs to be an explicit one.

duncand commented 2 years ago

I came here because Ovid closed #30 and pointed at this issue. Your post seemed to be the only one with anything similar to a solution to the larger complex. If that's not intended, fine.

Nevertheless, i made a video. Corinna is steering towards making what it shows impossible, i.e. towards being useless with a debugger. If that is the decision Cor makes, it needs to be an explicit one.

@wchristian For the general case of actual live debugging we would need to have structure traversal which is separate from and in addition to any of the serialization functions I named, partly because of the need to scale. For the general case of at rest debugging, what I said does cover it, specifically the number 2, serialization for interchange or storage/retrieval. Do you agree?

wchristian commented 2 years ago

I am not married to any one particular solution*, and am currently only advocating for the premises to be determined, but that sounds reasonable, yes.

* for example one possible option is to disable strict encapsulation when running under the debugger

duncand commented 2 years ago

I am not married to any one particular solution*, and am currently only advocating for the premises to be determined, but that sounds reasonable, yes.

  • for example one possible option is to disable strict encapsulation when running under the debugger

The way I see it, to have true complete debugging, there can't be an absolute concept of privacy. It must be possible to write arbitrary user-land code which is able to see into anything, such that all the interesting and non-trivial parts of a debugger can be written in and run in user land and do their work using ordinary system-provided APIs. There would have to be no true private anything, and privacy is advisory rather than absolute, but that the advisory privacy comes with enough barriers that one would not normally write code that bypasses them.

I don't know if there's any precedent for this but I had an idea since a few years ago for designing a language such that access to privates requires shared secrets. So Perl could be run in such a way that the implementation provides a copy of say a "debugger key" to the user-written debugger program, which treats it as a data piece, and then whenever the debugger wants to view something private, it does this say using a "builtin::get_private()" routine which takes for example a few arguments that are the thing to inspect, the name of the property one wants to view, and the key. Or the debugger key could be an object and get_private() a method and so the key is implicitly provided.

This idea could also be generalized to not be about debuggers specifically but just a way for various classes to manage shared secrets which include each others' privates and that can be on a per-object basis rather than for all objects of the same class. I mentioned something like this before as an alternative to having class fields.

So we can have a concept of true system-enforced "unbreakable" privacy but with secured back doors for those who are allowed to see them. A real world analogy is like the special keys that firefighters have to access buildings.

@wchristian So I agree that whatever you think Perl should provide so Corinna can be properly debugged, it should be there.

Ovid commented 2 years ago

The intent is to ensure that debugging is supported. Due to the dynamic nature of Perl, it will still be possible to use those same facilities for violating encapsulation in general programming, but the intent is to make that harder to do so that it doesn't become the default $customer->{birthday} = $not_a_birthday;.

HaraldJoerg commented 2 years ago

I suggest we take the current implementation of Object::Pad as an example, since it clearly demonstrates how this could work nicely. For context, let me compare it to the hash-based zoo of OO modules where Moose is a prominent example, and to the inside-out based modules like Dios.

Moose-like objects do not need any special support for debugging or serializing. Blessed hash references have been around for quite some time before Moose appeared. On the downside, they are just that: Blessed hash references. They can be used as hash references from anywhere in the code base, and statements like Ovid's example $customer->{birthday} are very difficult to spot. On the other hand, the inside-out modules do offer complete encapsulation, but neither the debugger nor Data::Dumper and friends can work on those. In my perception, Moose and friends have been a game changer to Perl OO, whereas none of the inside-out modules is widely used. For myself, the convenience of Moose outweighs the privacy of inside-out modules, especially in the early stage of development.

Now look at Object::Pad: Its objects are immune to intrusions like $customer->{birthday} = $not_a_birthday; since they are not hash references. Which also means that the debugger and serializers can't use them as such. But it does have a meta protocol which can be used to inspect the objects. So yes, one can change objects (and break them, perhaps) outside their designed API. But compared to a hash dereference, it takes a lot of effort to do so. And, at least as important: Such manipulations can be easily detected, Object::Pad::MOP::Class stands out wherever it is used. As for this issue's subject, Object::Pad classes are disambiguated from other references by inheriting from Object::Pad::UNIVERSAL.

So, if Corinna follows the path of Object::Pad, we should be all set.