History based IDL subtyping

nomeata commented 4 years ago

These are just some ideas, the result of more thinking about how to add optional fields in records, and the hope of finding a solution that is not as “dynamic” as Andreas’ https://github.com/dfinity-lab/motoko/pull/832.

This is also thinking more about what @matthewhammer might have in mind or gut feeling when keeps repeating that type evolution isn't subtyping.

I do not think this scheme is practically viable! This is more food for further thought.

Core idea

IDL types remember their history. This way the receiver of a message can interpret that value at the common ancestor of the message type and the expected type.

Type evolution

First, inspired by Matthew, I define a relation of “allowed type evolution” t1 ~> t2. For records, the allowed steps are the deprecated of fields, the addition of optional fields, and evolving a type in a nested position:

{ x : t; … } ~> { x : deprecated; … } (with some dedicated deleted type)
{ … } ~> { x : opt t; … }
{ x : t; … } ~> { x : t'; … } if t ~> t'.
t1 ~> t3 if t1 ~> t2 and t2 ~> t3 (we can do multiple steps at once).

Given t1 ~> t2 we can calculate (unique) coercions f: t1 -> t2 and f : t2 -> t1 in the obivous way, and write (f1, f2) : t1 ~> t2 for that.

Type history

Canister methods’s arguments and result types are now not just simple types (arg_t -> ret_t) but type histories (ret_th -> arg_th), where type histories are non-empty sequences of types related by evolution

th = t1,…,tn  where t1 ~> t2, …, t(n-1) ~> tn

Let’s define current(t1,…,tn) = tn.

When passing a value v : current(th) to a method foo : th -> (), the sender sents (th,v), i.e. includes a type description as now.

When the implementation of foo : th2 -> () receives (th1,v), it

Find the longest common prefix of th1 and th2. Trap if none is found. Let t be the last type of that sequence.
Construct coercions (_, f) : t ~> current(th1) and (g, _) : t ~> current(th1)
Calculate g(f(v)) : current(th1) and continue with that.

Type evolution is a tree

The type histories here are sequences, but if you take the set of all the sequences ever encountered, you get the evolution tree of types (in the set-of-paths representation). In particlar, if some type is forked in incompatible ways at some point, this tree branches.

During decoding, the type of the value is located on that tree, and then the coercion is built along a path on that tree.

What if it is a dag?

The formulation above does not allow two type evolutions to merge again. One could allow that (e.g. picking the “best” t hat is both in th1 and th2, but that is somewhat non-deterministic. Maybe “the latest t in th2 that is also in th1 is the best candidate, as that has the most useful information to the receiver.

What are the effects?

The decoder does not have to implement a subtyping check, just a type equality check. With suitable constraints on the type encoding (e.g. minimal type defintions in the IDL header), this can be much simpler to implement. But it still needs an “eager” type check :-(

(One could argue that I am implementing a subtype check of sorts, just with a small set of possible “subtypes” to check, therefore reducing it a finite number of type equalities.)
If one party adds { foo: Int; bar: Int; …} and the other one adds { foo : Int; bar : Text; … }, then the messages will ignore both foo and bar, on the grounds that the overall type is different, and that therefore passing through foo would be accidential.

(But this would be much more convincing if we get the same effect if both parties happen to add just { foo : Int, …}, but independently. But I think this will be impossible in a structurally typed system: Distinguishing between “two parties added the same type by chance” and “two parties add the same type by coordination”.

Conclusion

Because of the need to track the history of types, this is not practical. And I fail to work out an convincing example why this would even be good.

I’ll leave this open for a week or so if people find it insightful, and close it aftwards.

rossberg commented 4 years ago

Thanks for picking this up again, I must admit that I still lack a good idea how to get unstuck on this.

Quick question: from a first read it sounds as if you are basically suggesting to move from structural to nominal subtyping, i.e., your history tree is essentially a nominal subtype hierarchy (a.k.a., interface inheritance hierarchy). Is that right or is there something more subtle happening?

nomeata commented 4 years ago

Sounds like you could be right. It is still structural in the sense the hstory doesn't have identity: If your type evolution and my type evoluion are the same (because, commonly, we both imported the types from the same IDL), then our types are compatible.

matthewhammer commented 4 years ago

FWIW, I endorse this line of thinking enthusiastically.

@nomeata This is a concise, clear exposition of the idea I was trying to form and communicate late last year. Thanks for thinking this through, and writing this up. Much appreciated. : )

BTW, this "structural versus nominal" typing distinction is exactly the idea I have in mind too. IDL should give canisters a nominal typing when we enter the world of evolution (canister types changing over time) so that their identity can remain fixed amidst this evolution, however we characterize it.

matthewhammer commented 4 years ago

I’ll leave this open for a week or so if people find it insightful, and close it aftwards.

Would a simple motivating example extend its life here?

matthewhammer commented 4 years ago

Because of the need to track the history of types, this is not practical.

Unfortunately, I do not see any other way to make IDL typing work amidst multiple versions of the API.

I think the way that this could be made practical is the following two assumptions:

There is a "window" of versions, not extending back forever, and
calls to versions that are "too old" are just rejected by the system.

Aside: This window may make it easier for canisters to "merge" again too; not sure if that's desirable or not. We may want to prevent it in some cases?

nomeata commented 4 years ago

Would a simple motivating example extend its life here?

It would surely be enlightening!

But the bigger problem is keeping track of the history.

I am not worried about keeping a long history. I am worried about he tooling impact of keeping any history at all; it is pretty unusual that the content of your source tree somehow depends on when and which versions you deployed.

Assuming we could solve that, I don't think a long history is a problem, so I don’t see the need to worry about windows etc.

Maybe there is a solution where the history isn’t stored with the source, but rather with the canister. I.e. the source code has (or generates) IDL with a normal type, but upon build and deploy, the the deployment tool fetches the IDL history from the canister, checks that the current type can be evolved to the new type, and includes that full history in the canister.

But all very hairy 💇

rossberg commented 4 years ago

My understanding is that the primary goal of @nomeata's idea is to avoid a structural subtype test at runtime in favour of a cheaper "nominal" test. But if you ignore implementation pragmatics, then from a purely semantic perspective, nominal is strictly less flexible and more bureaucratic than structural. @matthewhammer, you seem to think that is a semantic advantage, but can you give an argument why?

rossberg commented 4 years ago

nomeata:

But all very hairy

Yeah, no matter where you store the history, my gut feeling is that it would induce a global implementation (and usability) cost that is much higher than the local implementation cost it tries to avoid. I guess that's what you mean by impractical?

nomeata commented 4 years ago

There is a bit more than just replacing the runtime subtype test with a simpler one. There is also the idea to “to cast via the common ancestor”, and not directly across. It is unclear to me how much the practical effect is. But sometimes it does kick in, as in this example:

t1 = { x : text }
t2 = { x : text; baz : opt int, new : opt () }
t3 = { x : text; bar : opt int, baz : opt int }

sender history: t1, t3
sender value: { x = "hi"; bar = ?1; baz = ?2 }
receiver history: t1, t2

This situation arises when two parties at some point agree on the interface t1, but then independently extend the types.

The dynamic subtype test would find that t3 <: t2, coerce along this subtype, and use value { x = "hi", baz = ?2, new = null }. The history-based test would find t`` as the common ancestor, then first coerce backwards alongt1 ~> t3to get{ x = "hi" }and then forward alongt1 ~> t2to get{ x = "hi", baz = null, new = null }. NB: The value in thebaz` field is not used, becuase it wasn’t part of the interface that the two parties agreed on.

In this example, this is desired behavor. But it’s too fragile: If two parties independently change the interface the same way, the data still comes through. And the intermediate steps in the type history are significant.

rossberg commented 4 years ago

Presumably, this could be made more solid by stamping versions, i.e., going to a truly nominal semantics (I guess that's what @matthewhammer is getting at?). But it's not clear where these stamps would be coming from, and it looks like requiring new global infrastructure.

nomeata commented 4 years ago

Timestamps? Or, less hacky, UUIDs?

rossberg commented 4 years ago

The latter, probably. But that would require the notion of an "owner" of a type definition, and it's not clear what that means across importing IDLs, or what if you replace an import with a copy.

nomeata commented 4 years ago

But that would require the notion of an "owner" of a type definition,

Would it? Maybe on a level of convention. I assume that whenever someone changes a type, they pick a new UUID. If I copy a type from fromwhere else, and then change it, then I update the UUID, because I just forked the evolution.

and it's not clear what that means across importing IDLs,

If you import, you import with the UUID, because “import” likely means that you want to be compatible with that service.

or what if you replace an import with a copy.

With an unchanged UUID? Then you are still compatible, as expected.

If you copy and change the UUID, then that’s a way of forking, and thus indicating that although the types happen to be the same, you don't intend to be compatible.

(Maybe it shouldn’t be a pure UUID, but rather a hash of a UUID and a canonicalized version of the type, to have the property that while same types can have differnent ids, you can’t have differnent types with the same ids.)

rossberg commented 4 years ago

You mean manually putting the id in the IDL sources? How can you ensure that's done consistently, i.e., nobody reuses the same id for different versions? And how/when would it be generated from Motoko?

nomeata commented 4 years ago

How can you ensure that's done consistently, i.e., nobody reuses the same id for different versions?

Maybe by mixing in a (hash) of the type, as just described?

And how/when would it be generated from Motoko?

Right, the toll on the tooling is just too taxing.

matthewhammer commented 4 years ago

Regarding my motivating examples

I always come back to wanting "free evolution" of variants, or more generally, "free evolution" of algebraic data types from ML.

Nearly every canister I'd ever want to write myself and deploy would expose at least one DSL, and perhaps many. To make things concrete here, let's imagine that I have a spreadsheet DSL as part of my IDL spec, for a canister that does something like Excel, but very simple for now.

For each operation you might think of as a public canister call in a "calculator" canister, I instead want to have a binary constructor in my DSL for this spreadsheet canister. Then, I'll construct formulas as expression trees, as usual. The expressions will be stored in cells, which I'll just ignore for now. Let's focus on the expression tree type, in IDL/Motoko:

type exp = {#add: (exp, exp); #sub: (exp, exp); #cellRef(cellName); #nat(Nat)};

In the context of this IDL evolution discussion, we've been discussing adding and removing public functions or record fields. That's not the issue here, where I really want to add or remove a variant case (e.g., add mul above), or change its type (e.g., by introducing more structure to the exp ADT, like line numbers, etc.). I also want to do those other things too, of course.

My understanding of Andreas's other proposal (the "current front runner" in this design space), is that we cannot evolve ADTs freely. My understanding of Joachim's design above is that we can. (Please correct me if any of that is unclear or seems wrong, of course.)

Regarding "ownership" of a canister's IDL type

If we try to divorce ourselves from our PL design principles (e.g., nominal typing introduces certain problems, at a fundamental, conceptual level) we cannot deny that many of our canister developers (perhaps even most of them?) will think of themselves as authors of the running canisters that they maintain and evolve. By extension, these canisters become "life-like" things whose evolution that they own and control.

In past meetings we've discussed briefly how our bigger story of platform risk seems connected, fundamentally, to questions about this evolutionary control. In my mind, the root question is all about identity of canisters over time. Meaning, if I have a canister named ThePublicSocialGraph on the IC, and I import it that way; but meanwhile, as I evolve my canister it also evolves over time, there are questions about who controls this mutual evolution, and how.

As a user of this canister, I want to use it with a fixed identity that is tied to its purpose, so that I can use for all time, and so that I can enjoy liberation from "platform risk" (the risk of this canister changing in a way that's out of spirit with its original purpose and mission, and why I chose to start using it and storing data there, etc.).

Etc

I am not worried about keeping a long history.

Ah, okay. I'm not either. Thanks for clarifying.

FWIW, I imagined that whatever layer of "the system" that does the IDL checks we want to impose would also be responsible for knowing this history. How it is best stored in between those checks (and updates) is not clear to me.

rossberg commented 4 years ago

@matthewhammer:

My understanding of Andreas's other proposal (the "current front runner" in this design space), is that we cannot evolve ADTs freely. My understanding of Joachim's design above is that we can. (Please correct me if any of that is unclear or seems wrong, of course.)

Yeah, I don't see why that would be the case. Nominal typing allows strictly fewer relations than structural (which sometimes is what you want), but it doesn't magically allow additional relations, because structural is already characterised by allowing all operationally possible relations.

So to make evolution of variants work you'd need additional mechanisms, and those would be unlikely to depend on nominal typing AFAICS.

As a user of this canister, I want to use it with a fixed identity that is tied to its purpose

Hm, it seems to me that you are conflating canisters with canister interfaces. The latter are used in dual roles as both assurances of a given canister as well as assumptions about other canisters, e.g., expected as a parameter or import. In the parameter case there is a many-to-one relationship, so any possible interface naming would generally be independent from canister identities. And even for imports it depends on who provides the IDL. Note @nomeata talking about version trees for interfaces, whereas evolution of an individual canister is linear.

Subtyping is what connects assumptions and assurances. And the challenge here is that both ends of such a relation in a given network can evolve independently. What we need to define, then, is what independent interface evolution is legal and how it is handled in the case of divergence. Version naming only helps in so far that it potentially enables imposing additional restrictions.

matthewhammer commented 4 years ago

@rossberg said above about this proposal:

your history tree is essentially a nominal subtype hierarchy (a.k.a., interface inheritance hierarchy)

Yes, I think I agree with this (still), but to be clear, I think of the lattice elements of the ordering as entire histories, aka, lists of IDL service types. (I believe this is also what Joachim is saying above too, but getting explicit confirmation is always nice.)

Nominal typing allows strictly fewer relations than structural (which sometimes is what you want), but it doesn't magically allow additional relations, because structural is already characterised by allowing all operationally possible relations.

But in what you call the "nominal case" (aka "history-based IDL subtyping", this proposal) we are not choosing names as strings of characters that form identifers. Rather, "names" (the histories) are things with lots of structure, and specifically, their structure is a list of IDL types. In that sense, this "nominal" idea for naming IDL versions is really "structural", at least in some pretty concrete formal ways.

Given this, don't see how to map your argument about nominal-vs-structural onto the proposal that Joachim gives above.

I believe the "rules of evolution" here are more permissive than the subtyping rules in your proposal (wrt adding/removing variants); note that the "subtyping" here is not about the individual rules of evolution here, but rather, about lists of related evolutions (aka, histories).

It feels like your argument is assuming that we have an apples and apples comparison to draw, but we have apples (evolution via subtyping) and "oranges" (evolution via an independent relation, which then is latter used to define subtyping over the histories it permits).

matthewhammer commented 4 years ago

Hm, it seems to me that you are conflating canisters with canister interfaces.

Fair point. I feel myself doing this subconsciously all of the time. However, I do not think that the arguments I am making here are doing that, at least how I'm thinking of them internally. Perhaps the wording and communication of those arguments is doing that, though.

	PR https://github.com/dfinity-lab/motoko/pull/832	The proposal here
	("Subtyping relates two well-formed IDL service type structures")	("Subtyping relates two valid IDL evolves-to histories")
rule of evolution	subtyping rule	evolves-to rule
subtyping relation	same as evolution relation	compute LCA over shared evolution history

This table above is (partially) how I've internalized the key contrasts, and why I think Andreas's arguments about nominal-vs-structural typing may not apply here, to this proposal. (@nomeata Can you verify this matches what you are saying above? I believe it does, but I've been wrong before.)

matthewhammer commented 4 years ago

I believe the "rules of evolution" here are more permissive than the subtyping rules in your proposal (wrt adding/removing variants)

To be fair, it doesn't seem like variants are considered explicitly in the list of evolution rules given above. But, I think you can imagine having analogues to the rules for adding and removing fields of records that correspond to adding and removing constructors of sum types.

matthewhammer commented 4 years ago

An example of how I'd like to change variants, in both arguments and results, without being hamstrung by the ordering imposed by subtyping.

Version A:

type exp = variant { nat: nat; binaryAdd: record { ... } };
type evalLog = record { exp: exp; res: ?nat };
service {
  eval : ( e :  exp ) -> ?nat ;
  getLog : () -> [evalLog];
};

Notably, the variant exp is being used in both argument and result positions within the service, for the methods eval and getLog, respectively.

Version B, evolved from A.

Now, the service evolves by supporting an additional operation, expressed as a new constructor for exp, called binaryDivide:

type exp = variant { nat: nat; binaryAdd: record { ... }; binaryDivide: record { ... } };
type evalLog = ...
service { ... };

The other definitions are not directly affected, but are indirectly affected. I could have repeated the same text for the RHS of evalLog and the body of the service; the only "real" change is the equation defining exp used in each.

Evolution check

I argue that we want to permit service evolution to support this example, which represents a design pattern, really, not a corner case at all (at least to me).

If we agree (do we?), I think we can do so by extending Joachim's proposal accordingly, where we have a notion of "optional constructors" that can be added in the same spirit as optional fields. I do not see how there's an obstacle there, but I'm not 100% confident.

AFAIK, the only reason this wasn't considered before is because the records were already "hard" to support, and it wasn't clear how to generalize that solution, which conflates the IDL notions of valid subtyping and valid evolution. This proposal offers a way to separate them, and unblock us.

rossberg commented 4 years ago

@matthewhammer:

But in what you call the "nominal case" (aka "history-based IDL subtyping", this proposal) we are not choosing names as strings of characters that form identifers. Rather, "names" (the histories) are things with lots of structure, and specifically, their structure is a list of IDL types.

True, but as @nomeata points out above, this naming scheme actually is too weak to really protect against all accidents, while proper generative names would do that. So, I think that particular difference is rather accidental than fundamental.

I believe the "rules of evolution" here are more permissive than the subtyping rules in your proposal (wrt adding/removing variants); note that the "subtyping" here is not about the individual rules of evolution here, but rather, about lists of related evolutions (aka, histories).

I'm afraid you lost me there.

"Subtyping relates two valid IDL evolves-to histories"

I don't think that's what this proposal does in any interesting sense. AFAICS, all it does, effectively, is restricting the subtype lattice to certain edges, i.e., it's a smaller relation than the full structural one. @nomeata, please correct me if I'm misinterpreting it.

I argue that we want to permit service evolution to support this example, which represents a design pattern, really, not a corner case at all (at least to me).

Yes, but then you have to define what that means. And that requires coming up with some rules for mapping unknown cases, which is the non-obvious part. But I don't see why such rules would have any dependency on the suggestion in this issue; I'm pretty sure they would essentially be equivalent to some coercive subtyping rules. At least I don't see what else they would possibly be -- they ultimately mediate individual values between two concrete types, not between hierarchies.

nomeata commented 4 years ago

AFAICS, all it does, effectively, is restricting the subtype lattice to certain edges, i.e., it's a smaller relation than the full structural one. @nomeata, please correct me if I'm misinterpreting it.

Yes and no. It may be true that if th1 <: th2 in this gedankenexperiment, then current(th1) <: current(th2) in #832. But that’s not a very interesting statement, because

832’s rules around opt are very liberal, due to the run-time “is there any hope of using this value” check. For field types of form t opt, #832’s <: is the complete relation! So duh, every other relation we find is a subrelation of that one.
More importantly, when comparing these proposals, it seems crucial to always consider the coerctions that are derived/calculated/whatever. And there, we see differences: #832’s preserves as much data as possible, while this gedankenexperiment tries to preserve only the data that can be preserved by the actually agreed on interface (the common ancestor), and intentionally tries to discard “accidental data”. Unfortunately, that goal is not 100% reached.

rossberg commented 4 years ago

@nomeata, yes, I agree. But to clarify, the context (that I elided above) of the quote was @matthewhammer's hope that this semantics would somehow enable evolution of variants or similar relaxations, which are currently absent from the IDL. I was trying to explain why that's mostly an orthogonal problem and does not follow from your proposal.

nomeata commented 4 years ago

I agree it is orthogonal. In my proposal, the problems around extending variants would be part of the evolves-to releation (~>) and its elaboration, but the problems are the same.

And of course the problem is dual:

We can only add record fields if the content of the field has a default value, so we have to use opt inside.
We can only add variant constructors, if the container of the variant has a default value, so we’d have to wrap it in opt.

So yes, variant evolution is no harder of simpler than record evolution: You “just” have to only allow it inside opt, so that the decoder can map an unknown variant constructor to null. (Or some other specially marked default value, e.g. #unknown.)

And yes, orthogonal to history-based or not.

rossberg commented 4 years ago

Ah, great observation about the duality.

so we’d have to wrap it in opt.

More precisely, it would have to be wrapped in opt already. And that makes it much less useful from a practical perspective, I fear. I had previously thought along the lines of mapping to a "default" variant, but that seems even more ad-hoc, as it would require marking such a variant.

matthewhammer commented 4 years ago

Thanks for all of the discussion and clarifications above.

To step back a bit, my primary concern is that whatever enforcement mechanism we have for evolving the IDL of my canister affords me the ability to add new variant constructors, somehow. (As in my expression tree example above).

I had thought that the "fully structural" approach (https://github.com/dfinity-lab/motoko/pull/832) would somehow never extend to that, because it would somehow be too complex to do so.

After Joachim's duality observation above, I hope I'm interpreting the reaction accurately. The message I'm getting from Andreas and Joachim above is that we can in fact extend #832 to get what I want, and support my expression tree evolution above. Correct?

If so, then great. This history-based approach is interesting, but once separated from my motivating example through this orthogonality argument, it looses most of its interest to me, at least for now. Again, all of the discussion above was really helpful, in any case. Thanks again.

matthewhammer commented 4 years ago

The message I'm getting from Andreas and Joachim above is that we can in fact extend #832 to get what I want, and support my expression tree evolution above. Correct?

I realize it's not the perfect solution, given what Andreas says ("...less useful from a practical perspective"), but it's not totally forbidden, either.

In the case of my expression tree example, I can imagine how the opt(exp) (right?) would be interpreted as "error" when its null, or something like that.

If that all sounds plausible, then I'm happy.

matthewhammer commented 4 years ago

And yes, orthogonal to history-based or not.

👍 👍

matthewhammer commented 4 years ago

Just a (hopefully clarifying) note: Andreas was confused by me saying this above:

I believe the "rules of evolution" here are more permissive than the subtyping rules in your proposal (wrt adding/removing variants); note that the "subtyping" here is not about the individual rules of evolution here, but rather, about lists of related evolutions (aka, histories).

I now see that I was confused. Backing up, I didn't appreciate the orthogonality that Andreas and Joachim have both observed above, that the question of history-based versus history-independent (ordinary, structural) subtyping is separate from supporting the variant evolution in my expression tree example.

Prior to seeing this fact, I had thought that it was important to separate the concerns of subtyping (and coercions, etc) from evolving the types defined in the service. Obviously they cannot be totally separated, of course. I thought that histories were going to provide that indirection. In fact, the concern is orthogonal. I welcome this outcome.

nomeata commented 4 years ago

I think the discussion here has ebbed down, and there isn’t much more to learn from these ideas. Closing to keep the issue list tidy, but nothnig is lost of course. More fresh ideas in #1150.

nomeata commented 4 years ago

Closing, Candid discussion can happen at the Candid repo.

dfinity / motoko