IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 479 forks source link

Specify how to validate the next data script #1141

Closed kwxm closed 5 years ago

kwxm commented 5 years ago

We need a proper specification of this, to be included in the EUTXO specification.

This was discussed in the Plutus Call for the 21st of May.

Recording: https://drive.google.com/file/d/1urWw7V2CDXDy934z01zwN69Iwgk3MUot/view?usp=sharing

Slides by Jann: https://docs.google.com/presentation/d/1RCRDajMcBaqPFNjaw2mx7qHaeglHgbrin1x4wjv5qgw/edit#slide=id.g57fbb6ddb5_0_0

GitHub document: https://github.com/input-output-hk/plutus/blob/master/docs/extended-utxo/ValidatingTheNextDataScript.md

GitHub issue https://github.com/input-output-hk/plutus/issues/426

GitHub issue https://github.com/input-output-hk/plutus/issues/488

Write-up by Roman: https://github.com/effectfully/plutus-prototype/blob/master/docs/extended-utxo/ValidatingTheNextDataScript.md. This summarises a lot of the stuff above.


[Added on the 29th of July 2019]

Several Slack conversations:

28th June: https://input-output-rnd.slack.com/archives/C21UF2WVC/p1561726440294400

19th July: https://input-output-rnd.slack.com/archives/C21UF2WVC/p1563569957484600

21st July: https://input-output-rnd.slack.com/archives/C21UF2WVC/p1563725642000900

Email discussion at https://groups.google.com/a/iohk.io/forum/?utm_medium=email&utm_source=footer#!msg/plutus/fchZQdHKrT0/amXvYD5EDAAJ

michaelpj commented 5 years ago

The exact mechanics of what happens hasn't been pinned down yet. Here's what I'd been thinking, so it's down on paper:

The advantages of this scheme are:

cc @mchakravarty @effectfully @j-mueller

j-mueller commented 5 years ago
michaelpj commented 5 years ago

In your proposal the redeemer becomes an arbitrary function, so we can't use lift.

I think this is reasonably okay: conceptually, I think most redeemers will end up looking like \dataScripts -> (dataScriptData, redeemerData). So you will probably end up writing a function that does just the data script handling using the plugin, which can take the redeemer as its first argument. Then you apply them to get the final redeemer value.

e.g.

redeemerFunc :: RedeemerData -> () -> Interesting -> () -> (RedeemerData, Interesting)
redeemerFunc r _ i _ = (r, i)

redeemer :: () -> Interesting -> () -> (RedeemerData, Interesting)
redeemer = compile redeemerFunc `apply` redeemerLifted

I agree this is pretty awkward, though. But then this whole idiom is pretty awkward :man_shrugging:

The bytestring is a hash, correct?

Yeah. The idea being that a "serialized" value is just a value and its hash, where we trust that it was provided by the validating node. We could put it in a datatype too, though.

I think when we talk about "the transaction" as far as Plutus is concerned the term should always refer to the transaction before coin selection, ie. a transaction that is unsigned and potentially unbalanced.

But at the point of validation we have no way of knowing which inputs/outputs are fee outputs, unless we persuade the chain team to add that as a concept.

Here's a simpler solution, that occurred to me: only pass the data scripts for outputs that have them as arguments. Provided the fee inputs/outputs don't have data scripts, they the won't show up and it's fine. So ignore all those silly unit arguments I had earlier.

kwxm commented 5 years ago

I'm not sure that the details of this stuff belong in the EUTXO specification since that's currently agnostic about what goes on at the script level. But it's got to go somewhere, and I don't think there's anywhere else that's suitable. I think that in the main document I'll explain the problem and say that some means has to be provided for validators to access the content of data scripts, and put the Plutus-specific details in an appendix.

[Thanks for the comments, by the way.]

mchakravarty commented 5 years ago

@michaelpj wrote,

The idea is that we need not include both seal and unseal in the set of available builtins when typechecking something.

What is the problem with allowing unseal? I think, only seal needs special treatment.

Re coin selection, coin selection will only ever add pubkey inputs and outputs. They have neither redeemer scripts nor data scripts. So, I don't think that causes any issues.

Re redeemer type, somebody (I forgot who) suggested to maybe just pass a n-tuple of the data scripts to the redeemer. I think, that is a good idea. In PLC, it'll desugar to its curried form, of course, but I think, it is the simpler API on the source level. This hopefully simplifies the situation with lift as well.

mchakravarty commented 5 years ago

@kwxm Sorry, but I disagree. This mechanism is an essential part of EUTxO and we need to describe it.

We actually don't think, we need to describe the problem much. The document is supposed to be a specification of EUTxO — i.e., we want concisely and precisely describe how it works. We can leave motivation and design space exploration to research papers and tutorials.

j-mueller commented 5 years ago

@michaelpj

unless we persuade the chain team to add that as a concept.

I think that would be the cleanest solution but I'm not sure how practical it is. I don't have any objections to the proposal, thanks for the clarifications.

kwxm commented 5 years ago

@kwxm Sorry, but I disagree. This mechanism is an essential part of EUTxO and we need to describe it.

I take your point, but this is a particularly irritating problem. Until we get to this issue the EUTXO model is essentially agnostic about what goes on at the script level, but now it looks as if we have to start talking about specific Plutus types in order to explain how EUTXO should work, which (a) reduces the generality of the model, and (b) might be a bit unpleasant from the point of veiw of formal specification, since the interface between the transaction level and the script level becomes messy. Given the amount of time that's been spent on this though, maybe we just have to do it that way.

michaelpj commented 5 years ago

Until we get to this issue the EUTXO model is essentially agnostic about what goes on at the script level, but now it looks as if we have to start talking about specific Plutus types in order to explain how EUTXO should work

I don't know that that's true. At the moment we have two levels of explanation:

I know I keep banging on about this layering thing but I really do think it's key to making this document make sense.

Re coin selection, coin selection will only ever add pubkey inputs and outputs. They have neither redeemer scripts nor data scripts. So, I don't think that causes any issues.

Provided you ignore outputs without data scripts, which I agree we should do. I was just being silly at first.

What is the problem with allowing unseal? I think, only seal needs special treatment.

Yes indeed, and of course at least the validator will need to use unseal.

Re redeemer type, somebody (I forgot who) suggested to maybe just pass a n-tuple of the data scripts to the redeemer. I think, that is a good idea. In PLC, it'll desugar to its curried form, of course, but I think, it is the simpler API on the source level. This hopefully simplifies the situation with lift as well.

So the idea here is that it makes it easier to e.g. ignore all the data scripts because you always just ignore one argument? It's not quite that simple, because you'll still need to specify the types of all the data scripts so you can give the type of the tuple argument that you're ignoring...

kwxm commented 5 years ago

I know I keep banging on about this layering thing but I really do think it's key to making this document make sense.

I've tried to improve that, so maybe you could have a look at the current version (section 4.1.2). There are still a few places where things aren't fully described, like precisely how validation takes place and how a PendingTx object gets converted to a script. I was considering relegating the detailed descriptions of these to an appendix, or parenthetical subsections.

michaelpj commented 5 years ago

Here's a reason not to pass a tuple to the redeemer: it requires the validating node to know all the types of the data scripts.

If the redeemer has type dt1 -> ... -> dtn -> rt, then the validating node can just do foldr apply dataScripts redeemer to pass them all in.

If the redeemer has type (dt1, ..., dtn) -> rt, then the validating node needs to construct an encoded tuple of the data scripts, given only their serialized forms. This, in general, requires knowing the types of the data scripts, so we'd be requiring more type inference and term construction work from the validating node, whereas otherwise it's just application.

mchakravarty commented 5 years ago

I don't think I understand that. How can you foldr apply redeemer datascripts (I assume the datascripts have to come last, right, it's the lists...), where datascripts is a (I assume from what you are saying) list of serialised scripts. (And if they are not serialised, how can they be in a list?) Sorry, I don't get it at all.

michaelpj commented 5 years ago

We've deserialized all the scripts into Programs. We have an apply function that applies one program to another. Crucially, this doesn't require you to know anything about them: you just make an Apply node and stuff the two programs in either side.

On the other hand, making a tuple requires you to construct a tuple, which requires you to know the types of the components (recall: a scott-encoded tuple looks like /\r . \(k: a1 ->... -> an -> r). k v1 ... vn, so you need the types in order to give the signature for k).

effectfully commented 5 years ago

Add a builtin type sealed : * -> *, a builtin function unseal : forall a . sealed a -> a, and a builtin function seal : forall a . a -> sealed a.

Did this with the "old" builtins machinery: #1226. But we decided not to go with this particular implementation, because it's way too hacky.

mchakravarty commented 5 years ago

We've deserialized all the scripts into Programs. We have an apply function that applies one program to another. Crucially, this doesn't require you to know anything about them: you just make an Apply node and stuff the two programs in either side.

Yes, that is what I thought, but you earlier wrote, "the validating node needs to construct an encoded tuple of the data scripts, given only their serialized forms." We deserialise anyway, so I am not sure why we only have their serialised form in the tuple construction.

So, I think, what you are really saying is that if we only use applications, then we don't need to know the types of the datascript to construct the overall expression to validate, whereas if we need an abstraction (to build a tuple), we need those types. That makes sense. Given that the datascript are deserialised, getting their type isn't particularly hard, but it is an extra step.

An alternative would be to include the types of data scripts (or any serialised program, for that matter) in the serialised form.

michaelpj commented 5 years ago

We deserialise anyway, so I am not sure why we only have their serialised form in the tuple construction.

Sorry yes, that was me being unhelpful. The fact that they're serialized isn't relevant.

Given that the datascript are deserialised, getting their type isn't particularly hard, but it is an extra step.

It means running type inference, which is expensive. Perhaps okay, though.

michaelpj commented 5 years ago

An alternative would be to include the types of data scripts (or any serialised program, for that matter) in the serialised form.

We can't do this: we'd be trusting that we weren't being lied to about the type, so we'd have to check it anyway.

mchakravarty commented 5 years ago

We can't do this: we'd be trusting that we weren't being lied to about the type, so we'd have to check it anyway.

We are checking it anyway when we type check the overall expression that we evaluate for validation. If we don't, I could provide a redeemer whose expected argument doesn't match the type of the datascript that it is applied to (among other things).

michaelpj commented 5 years ago

That's true.

On the other hand, here's a reason to pass a tuple. When we're constructing a redeemer, we want to write something like ignoring @??? actualRedeemer. What goes in the ???? You don't want to have to pass n types, otherwise we'd have to have n different versions of ignoring. If we're passing a tuple, we can just pass the tuple type, i.e. ignoring @(n1, n2, n3....) actualRedeemer.

Could we take the tuple argument but make an uncurried function under the hood? I think this would be tricky, and we'd need TH to do it properly.

So in sum, I'm not sure what to do.

mchakravarty commented 5 years ago

Decision in the Plutus call from 9 July was to go with the curried version as this, after all, provides more flexibility on the Plutus Tx level.

michaelpj commented 5 years ago

After all that discussion it turns out we still need the types of the data scripts. Why? Because the validating node still needs to call seal on each of the data scripts. But seal is polymorphic, so first of all it needs to be instantiated at the argument type, so we need to know the type of the data script we're wrapping.

kwxm commented 5 years ago

There's now a draft of a specification of the current solution to this problem in the EUTXO specification: see https://github.com/input-output-hk/plutus/pull/1426 .

With luck, https://github.com/input-output-hk/plutus/issues/1436 will provide a much simpler solution.

kwxm commented 5 years ago

Closed due to being superseded by #1436.