IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 479 forks source link

Add serialization builtin to Plutus #488

Closed nau closed 5 years ago

nau commented 5 years ago

Add serialization builtin to Plutus, i.e.

serialize :: a -> ByteString
michaelpj commented 5 years ago

I don't think this is the right approach. The problem is: what are the semantics of this operation? Does it serialize the fully-evaluated argument of the function? If so, then you probably can't use it for comparing hashes with the data script in a transaction, because that is the hash of a serialized AST, which has no obligation to be fully-evaluated. In general it's just quite a different operation.

I had an alternative proposal which I described in https://iohk.myjetbrains.com/youtrack/issue/CGP-448, which is to say: if we want to talk about programs, we should add the facility to talk about programs, i.e. quoting. You wouldn't be able to do much with such a program, but on the plus side, serializing it would make much more sense. But adding metaprogramming facilities is also a big hammer.

Overall, I think it would be much better to figure out a way to get the typed value for the data script into the program, so you can use your own equality test to check whether it is what you want. Apart from anything else, this is much more flexible, and allows you to do other kinds of things than just equality.

mchakravarty commented 5 years ago

@michaelpj wrote,

Overall, I think it would be much better to figure out a way to get the typed value for the data script into the program, so you can use your own equality test to check whether it is what you want. Apart from anything else, this is much more flexible, and allows you to do other kinds of things than just equality.

But can we really get the typed value of an arbitrary the data script into an arbitrary program? The data scripts of outputs we want to check cannot be passed as typed arguments (like the data script paired with the validator in its own output). Hence, we need to embed them either as something like Data.Dynamic or as part of something like a heterogeneous list. Both boils down to providing runtime type information and a way to check it (in the spirit of Data.Typeable).

I do agree that this is a smaller hammer than adding meta-programming.

michaelpj commented 5 years ago

Heterogenous lists don't have to be runtime type information! The main problem with heterogeneous lists is that we don't want the validator to have to uniquely specify the number of transaction outputs and the types of their data scripts.

So what I thought was: what if there was a way for the validator to ignore some of them?

I haven't quite been able to make this typecheck, but I the idea is something like:

The main issue is how to make it so the validating node can just do things in a dumb way without having to be clever about performing type instantiations or anything.

mchakravarty commented 5 years ago

@michaelpj What you are saying is that you can have a heterogeneous list if you specify all types (i.e., list of types of all elements as the type of the heterogeneous list), but we call this usually a record and not a heterogeneous list. Also what you are describing is kind of an extensible record like interface.

TBH, I think, having dynamic types is simpler; especially in PLC where we don't have datatypes.

Is there a connection to #581?

michaelpj commented 5 years ago

That's what's usually called a heterogenous list in the Haskell world, no?

I don't know whether hlists are hard. The simplest version is to encode them like tuples: it's a value forall R . A -> B -> C -> R. The issue is the "forget the tail" behaviour - is it possible to consume that value and get the A and B without knowing whether there are more values afterwards? I'm honestly not sure, but our system is pretty powerful.

Maybe adding dynamic types is simpler, but I don't have a good sense of that.

I think it's unrelated to that issue - as we've discussed before "dynamic" builtins are a bit of a misnomer, they're not dynamic in this way.

effectfully commented 5 years ago

The issue is the "forget the tail" behaviour - is it possible to consume that value and get the A and B without knowing whether there are more values afterwards?

Yep:

open import Data.Product

postulate
  A B C D R : Set
  doSmthWithAtLeastTwoElems : ∀ {E} -> A × (B × E) -> R
  fourElems : A × (B × (C × D))

smth : R
smth = doSmthWithAtLeastTwoElems fourElems

I.e. instead of using an n-ary tuple you use n nested 2-ary tuples.

michaelpj commented 5 years ago

The problem with this is that suppose that the validator is doSmthWithAtLeastTwoElems and the validating node wants to feed it fourElems. Then the validating node needs to provide an instantiation for E! Which requires it to know how many elements the validators is expecting to consume, and do some annoying type analysis.

effectfully commented 5 years ago

The problem with this is that suppose that the validator is doSmthWithAtLeastTwoElems and the validating node wants to feed it fourElems. Then the validating node needs to provide an instantiation for E!

I see. Yeah, that's unfortunate.

Do we have somewhere a complete description of this validation problem? I.e. what data scripts are allowed to contain, what validator scripts are required to validate and possible solution to the problem.

michaelpj commented 5 years ago

No, I don't belive I've written this up in full anywhere. Jann has brought it up in the past.

effectfully commented 5 years ago

Would be great to have a complete description, because the problem seems complicated and important and because I do not think I fully understand everything and this is directly related to our Core language, so we have to be careful and there are plenty of people who can think about it.

@j-mueller, what do you think?

j-mueller commented 5 years ago

The problem that originally prompted us to consider the serialisation primitive and heterogeneous lists/dynamic types was validating the next data script but I'll try to write a better description now

j-mueller commented 5 years ago

See this picture. When validating the green input we don't know the values of the transaction's pay-to-script data scripts, because all we can see is their serialised form (ByteString). But there may be situations where we need to know the values of some of the outputs, for example to verify that they are what we expect them to be.

How would serialise :: a -> ByteString solve the problem? We could serialise the expected value and then compare that ByteString to the one in the transaction. But we couldn't do anything else with the new transaction's data scripts.

Another way to think of this is as a record, like in the vinyl package. In the picture I linked to, the signature of the green output's validator is something like Data -> Redeemer -> { hash1 :: S, hash3 :: T } -> () where the labels hash1 and hash3 refer to the transaction's outputs. So we should be able to run this script on any transaction that produces outputs to the hash1 and hash3 addresses, of types S and T.

effectfully commented 5 years ago

Thank you, this is very helpful and I now understand the problem.

I'm not sure if Data.Typeable is feasible. I do not see any problems with it immediately, but who knows.

But maybe we can do something with dynamic builin types. I'll play with that.

effectfully commented 5 years ago

I've spent some time thinking about this problem.

I'm not sure if Data.Typeable is feasible. I do not see any problems with it immediately, but who knows.

The Data.Typeable approach is going to be hard. The thing is, how do we reflect type binders at the term level? There are two options:

  1. use well-typed contexts and first-order variables. We're not in Agda and this is going to be hard, but maybe feasible
  2. use term-level binders, i.e. lambdas, i.e. a higher-order representation. But then how we decide equality of type indices using such representation? I guess only by instantiating lambdas in a first-order way, i.e. by falling back to 1. This is even more complicated than 1, because going from [P]HOAS to FOAS is very complicated in general. I'll play with a higher-order representation just in case

But maybe we can do something with dynamic builin types.

Our dynamic builtins are not crazy enough at the moment. Maybe in future we'll have sufficiently crazy builtins that allow to easily get reflection/meta-programming/something, but this is still an imperfect solution, because it's a huge hammer as we all agree.

So if we take a step back and think about this whole problem, then what we need is

  1. some kind of "recoverable" existential typing
  2. the ability to apply a well-typed function to an unordered set of tagged untyped elements, some of which may be redundant

Have we seen something like this in the wild? Yes, this is options parsing!

So any kind of first-order data can be serialized inside Plutus to bytestring and then deserialized back inside Plutus. The deserializer is determined by the expected type of the term that the bytestring represents. The expected type is determined by the hash associated with the bytestring. I.e. it's like the name of an option determines how to read the value of the option.

This is going to be annoying without type classes probably.

And that wouldn't work for higher-order data (like any kind of function), but then we can add the Dynamic primitive type to the language and the following primitive function:

getTyped : all a. dynamic -> maybe a

which calls our actual type checker being evaluated and either returns the value stored inside the dynamic or nothing if the type checker reports an error.

Unfortunately, at the moment we can't implement that, because we do not allow universal quantification in the types of builtin functions. That machinery was written long ago and for a very different purpose, so it needs a redesign.

@michaelpj, what do you think?

michaelpj commented 5 years ago

Doesn't your last suggestion of adding the dynamic type solve all the problems in one go? The data scripts can just be dynamics, and the validator can getTyped them to the types it expects, and fail if they don't match up.

I'm not totally sure how to implement dynamic, though. Internally I guess it's a Term, and we run the typechecker on it when we pull it out. But how do you create one? Do we have makeDynamic : forall a . a -> dynamic, does it get passed the AST? We have the substitution problem again - do we discharge the whole environment so it's a closed term?

So we'd need a fair few changes to the builtin machinery, but it seems promising. I hadn't thought about running the typechecker at runtime. That's a nice trick that I guess Haskell can't really use because it's runtime representation is different.


I agree that proper Data.Typeable looks hard. Reflecting our types in the language itself is a pain, and our types are in some ways even fancier than Haskell's types.

The serialize/deserialize inside Plutus approach makes sense... but how does the validator even get the serialized bytestring in the first place? Remember that data scripts can have any type, so we can't just insist they're always bytestrings. Then you need a way to work out that a data script has type bytestring... but that's basically just the original problem again!

I also don't think dynamic builtins are the right approach. For one thing, we're expecting them to change more-or-less on a per-chain basis, whereas the types we're talking about here might be different for each transaction. Unless we're willing to assert that all the data scripts on the chain have to have some special builtin type, which I'm not sure we are.

effectfully commented 5 years ago

Doesn't your last suggestion of adding the dynamic type solve all the problems in one go?

Not quite, because we also need to solve

We have the substitution problem again - do we discharge the whole environment so it's a closed term?

The serialize/deserialize inside Plutus approach makes sense... but how does the validator even get the serialized bytestring in the first place? Remember that data scripts can have any type, so we can't just insist they're always bytestrings. Then you need a way to work out that a data script has type bytestring... but that's basically just the original problem again!

Can't we just require a data script to always has type bytestring? I.e. require to always serialize the contents of a data script. Note that it also solves the environment discharging problem, because you have to actually produce a bytestring, which is a constant and not an AST.

I also don't think dynamic builtins are the right approach. For one thing, we're expecting them to change more-or-less on a per-chain basis, whereas the types we're talking about here might be different for each transaction. Unless we're willing to assert that all the data scripts on the chain have to have some special builtin type, which I'm not sure we are.

No, that's not what I have in mind. With sufficiently crazy builtins we might be able to hack a way to get reflection/meta-programming for types, once and for all. But I do not think this would be a good approach in any case.

michaelpj commented 5 years ago

I mean, I think solving the substitution problem by just substituting everything in would be fine, tbh.

Can't we just require a data script to always has type bytestring?

I don't think so. If were willing to even require the data scripts in the outputs of a single transaction to all have the same type then things would be easy, since we'd just have a homogeneous list. But that was rejected as too restrictive (and I agree).

effectfully commented 5 years ago

I don't think so.

I see. I assumed that would be an option. I'll think more.

effectfully commented 5 years ago

I played a bit with PHOAS Typeable. Here is a tiny Agda model (not exactly parametric, but that is easily fixable):

{-# OPTIONS --type-in-type #-}

infixr 5 _⇒_
infixl 7 _∙_

record Name {K : Set} (a : K) : Set where
  constructor name

data Typeable : {K : Set} -> K -> Set₁ where
  Var : {K : Set} {a : K} -> Name a -> Typeable a
  _⇒_ : {a b : Set} -> Typeable a -> Typeable b -> Typeable (a -> b)
  All : {K : Set} {f : K -> Set} -> (∀ {a} -> Name a -> Typeable (f a)) -> Typeable (∀ a -> f a)
  Lam : {J K : Set} {f : J -> K} -> (∀ {a} -> Name a -> Typeable (f a)) -> Typeable (λ a -> f a)
  _∙_ : {J K : Set} {f : J -> K} {a : J} -> Typeable f -> Typeable a -> Typeable (f a)

idTypeable : Typeable ((A : Set) -> A -> A)
idTypeable = All λ A -> Var A ⇒ Var A

fmapTypeable : Typeable ((F : Set -> Set) (A B : Set) -> (A -> B) -> F A -> F B)
fmapTypeable =
  All λ F -> All λ A -> All λ B ->
    (Var A ⇒ Var B) ⇒ Var F ∙ Var A ⇒ Var F ∙ Var B

I do not think this is going anywhere, because

  1. we have quantification over kinds there
  2. we need some kind of builtin ObservationalTypeTheory-like equality in order to be able to use this
michaelpj commented 5 years ago

One other thing that occurred to me re: dynamic. If we want to run the typechecker on the term that means it needs to have all the correct types. But that means we need to actually do type substitutions during evaluation, whereas I believe at the moment we just ignore them.

Also dynamic would preclude us from an implementation via erasure to an untyped calculus.

effectfully commented 5 years ago

But that means we need to actually do type substitutions during evaluation, whereas I believe at the moment we just ignore them.

True, but we can put encountered during evaluation types in an environment and supply that environment to the type checker when/if needed.

Also dynamic would preclude us from an implementation via erasure to an untyped calculus.

Is it the case that the content of a data script is trustworthy? If so, then in untyped calculus we can just feed a term directly to a function without running into the trouble of making types match. If that's not the case and we might expect one output to have a type A while it's actually of the type B, then we can't erase indeed.

michaelpj commented 5 years ago

Surely we can't implement getTyped : all a. dynamic -> maybe a in an erased calculus? It has to return Nothing if it's the wrong type, so we can't just go ahead regardless, the caller might want to do something in the case where it's the wrong type.

effectfully commented 5 years ago

It has to return Nothing if it's the wrong type, so we can't just go ahead regardless, the caller might want to do something in the case where it's the wrong type.

I was thinking that we can always return Just, if we trust data scripts. And type checking is only needed to make types match, but not to ensure safety/correctness/etc.

So do we trust data scripts? Or what is the context around this situation?

michaelpj commented 5 years ago

I don't think it's a good idea to assume that Just is always the "right" answer. Consider for example how you might use this to implement a backwards-compatible consumer by doing a type test for previous versions of your input type. Then a Nothing answer might be fine - you just move on to the next test. Whereas returning Just would be terrible.

effectfully commented 5 years ago

Before I learn some blockchain stuff to better understand the context around the problem, what do you think about the following?

Let's say we have this builtin function:

convert : all a b. a -> maybe b

It has this weird behavior: if a and b are definitionally the same, then convert is equal to just, otherwise it's equal to const nothing. We can't of course define such a function in Plutus Core, because it checks equality of types at runtime and we do not have a runtime representation of types, so we make convert a magic builtin. (We probably also do not want to give up on erasure, but this can be solved by erasing all types except those that can't be erased, something like a Proxy data type which stores a non-erasable type would work probably. Also we can't fully erase types with the other possible solution where we type check terms at runtime)

Having this function, we can make transaction output to be of the following type (I'm ignoring the value restriction problem for now):

list (hash × (all b. maybe b))

(where hash is an appropriate instantiation of bytesstring). Here is an example (in pseudosyntax):

txOutput : list (hash × (all b. maybe b))
txOutput =
    [ ("aaa", /\b -> convert {integer 1}    {b} (1!1)  )
    , ("bbb", /\b -> convert {unit}         {b} unitval)
    , ("ccc", /\b -> convert {bool -> bool} {b} not    )
    ]

And then we can do the option parsing thing that I talked above. For example if you need to get the part of the transaction output that has hash "ccc" and type bool -> bool, you just look up "ccc", get

/\ b -> convert {bool -> bool} {b} not

and instantiate b by bool -> bool:

convert {bool -> bool} {bool -> bool} not

which computes to just not of type maybe (bool -> bool).

This way we get homogeneous lists that contain heterogeneous values inside which we can extract by supplying expected types of those values.

It doesn't seem like we can get away without having or faking some runtime representation of types. I believe we can't fully reflect Plutus Core types in Plutus Core itself, so we are left with either magical runtime representations or magical builtins (like the type checking builtin or the convert builtin), if we want transaction outputs to be heterogeneous. convert seems the simplest, but I do not understand everything properly yet, so I can miss something or suggest nonsense.

michaelpj commented 5 years ago

Cute. So we can represent dynamic as all b. maybe b and implement it using convert.

They seem equally powerful: we can implement convert as fromDynamic . toDynamic. But convert seems conceptually simpler.

I also don't think we can get away without some kind of runtime type checking. We already have a runtime representation of types, so we can use that, we just have to keep them around to check them.

mchakravarty commented 5 years ago

I also don't think we can get away without some kind of runtime type checking. We already have a runtime representation of types, so we can use that, we just have to keep them around to check them.

Yes, but the beauty of reify types as terms is that you only keep those around that you really need.

@effectfully Your convert is Weirich's cast (but she uses exceptions instead of a Maybe to indicate failure): https://www.seas.upenn.edu/~sweirich/papers/cast/cast.pdf

effectfully commented 5 years ago

@michaelpj,

So we can represent dynamic as all b. maybe b and implement it using convert.

They seem equally powerful: we can implement convert as fromDynamic . toDynamic. But convert seems conceptually simpler.

Agreed.

@chak,

@effectfully Your convert is Weirich's cast (but she uses exceptions instead of a Maybe to indicate failure):

I was going to suggest to use exceptions instead of Maybe. But then we also need to have catch in Plutus Core which I do think we want to have, because people sure like throwing errors in libraries which other people then use. Without catch an inappropriate error can be a deal-breaker. But that's a separate discussion.

https://www.seas.upenn.edu/~sweirich/papers/cast/cast.pdf

Thanks, I'll check that.

mchakravarty commented 5 years ago

There is also Typing Dynamic Typing and A Lightweight Implementation of Generics and Dynamics.

mchakravarty commented 5 years ago

But then we also need to have catch in Plutus Core which I do think we want to have, because people sure like throwing errors in libraries which other people then use. Without catch an inappropriate error can be a deal-breaker.

In our context, maybe, raising an error is actually a good plan. When we use these dynamics types for data script validation, we would have to raise an error anyway if the type conversion fails.

(Nevertheless, I do agree, that maybe the primitive should return a Maybe, but we probably want a wrapper that raises error on Nothing.)

michaelpj commented 5 years ago

Big :-1: on throwing an error from me. I can think of at least one nice usecase which that would rule out: using type tests to provide backwards compatibility.

That way you can go "Is this the type I expect? If not, is it the last version of the type?" rather than blowing up immediately on the first one.

mchakravarty commented 5 years ago

That way you can go "Is this the type I expect? If not, is it the last version of the type?" rather than blowing up immediately on the first one.

Good point!

effectfully commented 5 years ago

@effectfully Your convert is Weirich's cast (but she uses exceptions instead of a Maybe to indicate failure): https://www.seas.upenn.edu/~sweirich/papers/cast/cast.pdf

There is also Typing Dynamic Typing and A Lightweight Implementation of Generics and Dynamics.

I studied all of those and I do not feel like they propose sufficiently good solutions in our case. First of all, they really only handle monomorphic types. Dealing with binders and polymorphism is much more complicated. But reflecting all types we have, I think, is just not possible, because we need to quantify over kinds and we do not have such a quantifier.

The convert thing that I suggest (let's call it cast indeed, though, I think coerce is more appropriate, but it's already reserved by Data.Coerce) handles all types, regardless of whether there is polymorphism or higher kinds. And it's really a pretty minimal thing: just one tiny primitive (that requires a rather intricate "typed builtins" machinery, but we need it for other things as well).

mchakravarty commented 5 years ago

But reflecting all types we have, I think, is just not possible, because we need to quantify over kinds and we do not have such a quantifier.

Do we need to be able to reflect all types? (The Haskell solutions also all have restrictions, but are still useful in practice.)

effectfully commented 5 years ago

Do we need to be able to reflect all types?

This is equal to "do we need to be able to put arbitrary types into data scripts?". I don't know the answer, but if we can easily avoid limitations, why not do that?

michaelpj commented 5 years ago

Yes, I'm also very in favour of avoiding arbitrary limitations. Limitations in the compilation target lead to weird errors in at the surface level, which I'd really rather avoid.

kwxm commented 5 years ago

Closing because we now have monotyped Data: see #1436.