Specify how to validate the next data script

Add a builtin type sealed : * -> *, a builtin function unseal : forall a . sealed a -> a, and a builtin function seal : forall a . a -> sealed a.
- The idea is that we need not include both seal and unseal in the set of available builtins when typechecking something.
The redeemer is now passed a value of sealed (a, bytestring) for every output in the transaction, in order. So if there are three, where the second has type b, then the signature of the redeemer is sealed ((), bytestring) -> sealed (b, bytestring) -> sealed ((), bytestring) -> redeemerType.
- The author of the redeemer knows what outputs there are, so it's their responsibility to add enough arguments.
  - PROBLEM: what about coin selection?
To ensure that neither the redeemer nor the data scripts create any sealed values, we typecheck them without seal in the builtin environment (the redeemer must at least have sealed in the environment, since it needs to include that in its argument signatures).
- The final application, where the redeemer is passed its arguments, gets typechecked with seal. This means that the validating node can pass arguments like seal 1 and it will typecheck.

The advantages of this scheme are:

We don't need to do anything terribly clever to prevent people from creating sealed values, we just exploit the fact that we can typecheck them in different environments to ensure that some builtins aren't used.
We don't necessarily need to code "sealed" as "serialized". We can just have type Serialized a = Sealed (a, ByteString).

cc @mchakravarty @effectfully @j-mueller

j-mueller commented 5 years ago

Currently we construct the redeemer values at (Haskell) runtime using lift. In your proposal the redeemer becomes an arbitrary function, so we can't use lift. And we should try to avoid requiring the use of the plugin for the redeemer, because we can't rely on having GHC around when we build the redeemer.

I think we would have to do something like this. Similar to the applyScript function that we already have, we add combinators like const, uncurry, flip that we can use to construct a function of the correct type. Then in the contract code, when we know what the transaction* looks like, we use these combinators to construct the redeemer that is expected by the validator.

* modulo coin selection, see below
sealed (a, bytestring) The bytestring is a hash, correct?
PROBLEM: what about coin selection?

I think when we talk about "the transaction" as far as Plutus is concerned the term should always refer to the transaction before coin selection, ie. a transaction that is unsigned and potentially unbalanced. Since for every unbalanced transaction t, if t spends at least one UTXO, there can be at most one proper transaction t' (on the blockchain) that subsumes t, this should be safe.

michaelpj commented 5 years ago

In your proposal the redeemer becomes an arbitrary function, so we can't use lift.

I think this is reasonably okay: conceptually, I think most redeemers will end up looking like \dataScripts -> (dataScriptData, redeemerData). So you will probably end up writing a function that does just the data script handling using the plugin, which can take the redeemer as its first argument. Then you apply them to get the final redeemer value.

e.g.

redeemerFunc :: RedeemerData -> () -> Interesting -> () -> (RedeemerData, Interesting)
redeemerFunc r _ i _ = (r, i)

redeemer :: () -> Interesting -> () -> (RedeemerData, Interesting)
redeemer = compile redeemerFunc `apply` redeemerLifted

I agree this is pretty awkward, though. But then this whole idiom is pretty awkward :man_shrugging:

The bytestring is a hash, correct?

Yeah. The idea being that a "serialized" value is just a value and its hash, where we trust that it was provided by the validating node. We could put it in a datatype too, though.

I think when we talk about "the transaction" as far as Plutus is concerned the term should always refer to the transaction before coin selection, ie. a transaction that is unsigned and potentially unbalanced.

But at the point of validation we have no way of knowing which inputs/outputs are fee outputs, unless we persuade the chain team to add that as a concept.

Here's a simpler solution, that occurred to me: only pass the data scripts for outputs that have them as arguments. Provided the fee inputs/outputs don't have data scripts, they the won't show up and it's fine. So ignore all those silly unit arguments I had earlier.

kwxm commented 5 years ago

I'm not sure that the details of this stuff belong in the EUTXO specification since that's currently agnostic about what goes on at the script level. But it's got to go somewhere, and I don't think there's anywhere else that's suitable. I think that in the main document I'll explain the problem and say that some means has to be provided for validators to access the content of data scripts, and put the Plutus-specific details in an appendix.

[Thanks for the comments, by the way.]

mchakravarty commented 5 years ago

@michaelpj wrote,

The idea is that we need not include both seal and unseal in the set of available builtins when typechecking something.

What is the problem with allowing unseal? I think, only seal needs special treatment.

Re coin selection, coin selection will only ever add pubkey inputs and outputs. They have neither redeemer scripts nor data scripts. So, I don't think that causes any issues.

Re redeemer type, somebody (I forgot who) suggested to maybe just pass a n-tuple of the data scripts to the redeemer. I think, that is a good idea. In PLC, it'll desugar to its curried form, of course, but I think, it is the simpler API on the source level. This hopefully simplifies the situation with lift as well.

mchakravarty commented 5 years ago

@kwxm Sorry, but I disagree. This mechanism is an essential part of EUTxO and we need to describe it.

We actually don't think, we need to describe the problem much. The document is supposed to be a specification of EUTxO — i.e., we want concisely and precisely describe how it works. We can leave motivation and design space exploration to research papers and tutorials.

j-mueller commented 5 years ago

@michaelpj

unless we persuade the chain team to add that as a concept.

I think that would be the cleanest solution but I'm not sure how practical it is. I don't have any objections to the proposal, thanks for the clarifications.

kwxm commented 5 years ago

@kwxm Sorry, but I disagree. This mechanism is an essential part of EUTxO and we need to describe it.

I take your point, but this is a particularly irritating problem. Until we get to this issue the EUTXO model is essentially agnostic about what goes on at the script level, but now it looks as if we have to start talking about specific Plutus types in order to explain how EUTXO should work, which (a) reduces the generality of the model, and (b) might be a bit unpleasant from the point of veiw of formal specification, since the interface between the transaction level and the script level becomes messy. Given the amount of time that's been spent on this though, maybe we just have to do it that way.

michaelpj commented 5 years ago

Until we get to this issue the EUTXO model is essentially agnostic about what goes on at the script level, but now it looks as if we have to start talking about specific Plutus types in order to explain how EUTXO should work

I don't know that that's true. At the moment we have two levels of explanation:

Ledger layer. Script layer is opaque, just provides mystery functions. At this point it's still straightforward: you don't even need to change the signature of the validation function, since you're already passing it the pending transaction which has the data scripts in it.
Script layer. We're already specifying quite a lot of details about how you glue the various scripts together in the validation (for example, we probably need to mention the adaptor that turns booleans into errors if we haven't already). So I think it's okay for this to be complicated.

I know I keep banging on about this layering thing but I really do think it's key to making this document make sense.

Re coin selection, coin selection will only ever add pubkey inputs and outputs. They have neither redeemer scripts nor data scripts. So, I don't think that causes any issues.

Provided you ignore outputs without data scripts, which I agree we should do. I was just being silly at first.

What is the problem with allowing unseal? I think, only seal needs special treatment.

Yes indeed, and of course at least the validator will need to use unseal.

Re redeemer type, somebody (I forgot who) suggested to maybe just pass a n-tuple of the data scripts to the redeemer. I think, that is a good idea. In PLC, it'll desugar to its curried form, of course, but I think, it is the simpler API on the source level. This hopefully simplifies the situation with lift as well.

So the idea here is that it makes it easier to e.g. ignore all the data scripts because you always just ignore one argument? It's not quite that simple, because you'll still need to specify the types of all the data scripts so you can give the type of the tuple argument that you're ignoring...

kwxm commented 5 years ago

I know I keep banging on about this layering thing but I really do think it's key to making this document make sense.

I've tried to improve that, so maybe you could have a look at the current version (section 4.1.2). There are still a few places where things aren't fully described, like precisely how validation takes place and how a PendingTx object gets converted to a script. I was considering relegating the detailed descriptions of these to an appendix, or parenthetical subsections.

michaelpj commented 5 years ago

Here's a reason not to pass a tuple to the redeemer: it requires the validating node to know all the types of the data scripts.

If the redeemer has type dt1 -> ... -> dtn -> rt, then the validating node can just do foldr apply dataScripts redeemer to pass them all in.

If the redeemer has type (dt1, ..., dtn) -> rt, then the validating node needs to construct an encoded tuple of the data scripts, given only their serialized forms. This, in general, requires knowing the types of the data scripts, so we'd be requiring more type inference and term construction work from the validating node, whereas otherwise it's just application.

mchakravarty commented 5 years ago

I don't think I understand that. How can you foldr apply redeemer datascripts (I assume the datascripts have to come last, right, it's the lists...), where datascripts is a (I assume from what you are saying) list of serialised scripts. (And if they are not serialised, how can they be in a list?) Sorry, I don't get it at all.

michaelpj commented 5 years ago

We've deserialized all the scripts into Programs. We have an apply function that applies one program to another. Crucially, this doesn't require you to know anything about them: you just make an Apply node and stuff the two programs in either side.

On the other hand, making a tuple requires you to construct a tuple, which requires you to know the types of the components (recall: a scott-encoded tuple looks like /\r . \(k: a1 ->... -> an -> r). k v1 ... vn, so you need the types in order to give the signature for k).

effectfully commented 5 years ago

Add a builtin type sealed : * -> *, a builtin function unseal : forall a . sealed a -> a, and a builtin function seal : forall a . a -> sealed a.

Did this with the "old" builtins machinery: #1226. But we decided not to go with this particular implementation, because it's way too hacky.

mchakravarty commented 5 years ago

We've deserialized all the scripts into Programs. We have an apply function that applies one program to another. Crucially, this doesn't require you to know anything about them: you just make an Apply node and stuff the two programs in either side.

Yes, that is what I thought, but you earlier wrote, "the validating node needs to construct an encoded tuple of the data scripts, given only their serialized forms." We deserialise anyway, so I am not sure why we only have their serialised form in the tuple construction.

So, I think, what you are really saying is that if we only use applications, then we don't need to know the types of the datascript to construct the overall expression to validate, whereas if we need an abstraction (to build a tuple), we need those types. That makes sense. Given that the datascript are deserialised, getting their type isn't particularly hard, but it is an extra step.

An alternative would be to include the types of data scripts (or any serialised program, for that matter) in the serialised form.

michaelpj commented 5 years ago

We deserialise anyway, so I am not sure why we only have their serialised form in the tuple construction.

Sorry yes, that was me being unhelpful. The fact that they're serialized isn't relevant.

Given that the datascript are deserialised, getting their type isn't particularly hard, but it is an extra step.

It means running type inference, which is expensive. Perhaps okay, though.

michaelpj commented 5 years ago

An alternative would be to include the types of data scripts (or any serialised program, for that matter) in the serialised form.

We can't do this: we'd be trusting that we weren't being lied to about the type, so we'd have to check it anyway.

mchakravarty commented 5 years ago

We can't do this: we'd be trusting that we weren't being lied to about the type, so we'd have to check it anyway.

We are checking it anyway when we type check the overall expression that we evaluate for validation. If we don't, I could provide a redeemer whose expected argument doesn't match the type of the datascript that it is applied to (among other things).

michaelpj commented 5 years ago

That's true.

On the other hand, here's a reason to pass a tuple. When we're constructing a redeemer, we want to write something like ignoring @??? actualRedeemer. What goes in the ???? You don't want to have to pass n types, otherwise we'd have to have n different versions of ignoring. If we're passing a tuple, we can just pass the tuple type, i.e. ignoring @(n1, n2, n3....) actualRedeemer.

Could we take the tuple argument but make an uncurried function under the hood? I think this would be tricky, and we'd need TH to do it properly.

So in sum, I'm not sure what to do.

mchakravarty commented 5 years ago

Decision in the Plutus call from 9 July was to go with the curried version as this, after all, provides more flexibility on the Plutus Tx level.

michaelpj commented 5 years ago

After all that discussion it turns out we still need the types of the data scripts. Why? Because the validating node still needs to call seal on each of the data scripts. But seal is polymorphic, so first of all it needs to be instantiated at the argument type, so we need to know the type of the data script we're wrapping.

kwxm commented 5 years ago

There's now a draft of a specification of the current solution to this problem in the EUTXO specification: see https://github.com/input-output-hk/plutus/pull/1426 .

With luck, https://github.com/input-output-hk/plutus/issues/1436 will provide a much simpler solution.

kwxm commented 5 years ago

Closed due to being superseded by #1436.

IntersectMBO / plutus

Specify how to validate the next data script #1141