Closed nau closed 5 years ago
I don't think this is the right approach. The problem is: what are the semantics of this operation? Does it serialize the fully-evaluated argument of the function? If so, then you probably can't use it for comparing hashes with the data script in a transaction, because that is the hash of a serialized AST, which has no obligation to be fully-evaluated. In general it's just quite a different operation.
I had an alternative proposal which I described in https://iohk.myjetbrains.com/youtrack/issue/CGP-448, which is to say: if we want to talk about programs, we should add the facility to talk about programs, i.e. quoting. You wouldn't be able to do much with such a program, but on the plus side, serializing it would make much more sense. But adding metaprogramming facilities is also a big hammer.
Overall, I think it would be much better to figure out a way to get the typed value for the data script into the program, so you can use your own equality test to check whether it is what you want. Apart from anything else, this is much more flexible, and allows you to do other kinds of things than just equality.
@michaelpj wrote,
Overall, I think it would be much better to figure out a way to get the typed value for the data script into the program, so you can use your own equality test to check whether it is what you want. Apart from anything else, this is much more flexible, and allows you to do other kinds of things than just equality.
But can we really get the typed value of an arbitrary the data script into an arbitrary program? The data scripts of outputs we want to check cannot be passed as typed arguments (like the data script paired with the validator in its own output). Hence, we need to embed them either as something like Data.Dynamic
or as part of something like a heterogeneous list. Both boils down to providing runtime type information and a way to check it (in the spirit of Data.Typeable
).
I do agree that this is a smaller hammer than adding meta-programming.
Heterogenous lists don't have to be runtime type information! The main problem with heterogeneous lists is that we don't want the validator to have to uniquely specify the number of transaction outputs and the types of their data scripts.
So what I thought was: what if there was a way for the validator to ignore some of them?
I haven't quite been able to make this typecheck, but I the idea is something like:
A :: B :: C :: Nil
)A::B::_
, where _
is any heterogeneous list.The main issue is how to make it so the validating node can just do things in a dumb way without having to be clever about performing type instantiations or anything.
@michaelpj What you are saying is that you can have a heterogeneous list if you specify all types (i.e., list of types of all elements as the type of the heterogeneous list), but we call this usually a record and not a heterogeneous list. Also what you are describing is kind of an extensible record like interface.
TBH, I think, having dynamic types is simpler; especially in PLC where we don't have datatypes.
Is there a connection to #581?
That's what's usually called a heterogenous list in the Haskell world, no?
I don't know whether hlists are hard. The simplest version is to encode them like tuples: it's a value forall R . A -> B -> C -> R
. The issue is the "forget the tail" behaviour - is it possible to consume that value and get the A
and B
without knowing whether there are more values afterwards? I'm honestly not sure, but our system is pretty powerful.
Maybe adding dynamic types is simpler, but I don't have a good sense of that.
I think it's unrelated to that issue - as we've discussed before "dynamic" builtins are a bit of a misnomer, they're not dynamic in this way.
The issue is the "forget the tail" behaviour - is it possible to consume that value and get the
A
andB
without knowing whether there are more values afterwards?
Yep:
open import Data.Product
postulate
A B C D R : Set
doSmthWithAtLeastTwoElems : ∀ {E} -> A × (B × E) -> R
fourElems : A × (B × (C × D))
smth : R
smth = doSmthWithAtLeastTwoElems fourElems
I.e. instead of using an n-ary tuple you use n nested 2-ary tuples.
The problem with this is that suppose that the validator is doSmthWithAtLeastTwoElems
and the validating node wants to feed it fourElems
. Then the validating node needs to provide an instantiation for E
! Which requires it to know how many elements the validators is expecting to consume, and do some annoying type analysis.
The problem with this is that suppose that the validator is
doSmthWithAtLeastTwoElems
and the validating node wants to feed itfourElems
. Then the validating node needs to provide an instantiation forE
!
I see. Yeah, that's unfortunate.
Do we have somewhere a complete description of this validation problem? I.e. what data scripts are allowed to contain, what validator scripts are required to validate and possible solution to the problem.
No, I don't belive I've written this up in full anywhere. Jann has brought it up in the past.
Would be great to have a complete description, because the problem seems complicated and important and because I do not think I fully understand everything and this is directly related to our Core language, so we have to be careful and there are plenty of people who can think about it.
@j-mueller, what do you think?
The problem that originally prompted us to consider the serialisation primitive and heterogeneous lists/dynamic types was validating the next data script but I'll try to write a better description now
See this picture. When validating the green input we don't know the values of the transaction's pay-to-script data scripts, because all we can see is their serialised form (ByteString
). But there may be situations where we need to know the values of some of the outputs, for example to verify that they are what we expect them to be.
How would serialise :: a -> ByteString
solve the problem? We could serialise the expected value and then compare that ByteString
to the one in the transaction. But we couldn't do anything else with the new transaction's data scripts.
Another way to think of this is as a record, like in the vinyl
package. In the picture I linked to, the signature of the green output's validator is something like Data -> Redeemer -> { hash1 :: S, hash3 :: T } -> ()
where the labels hash1
and hash3
refer to the transaction's outputs. So we should be able to run this script on any transaction that produces outputs to the hash1
and hash3
addresses, of types S
and T
.
Thank you, this is very helpful and I now understand the problem.
I'm not sure if Data.Typeable
is feasible. I do not see any problems with it immediately, but who knows.
But maybe we can do something with dynamic builin types. I'll play with that.
I've spent some time thinking about this problem.
I'm not sure if
Data.Typeable
is feasible. I do not see any problems with it immediately, but who knows.
The Data.Typeable
approach is going to be hard. The thing is, how do we reflect type binders at the term level? There are two options:
But maybe we can do something with dynamic builin types.
Our dynamic builtins are not crazy enough at the moment. Maybe in future we'll have sufficiently crazy builtins that allow to easily get reflection/meta-programming/something, but this is still an imperfect solution, because it's a huge hammer as we all agree.
So if we take a step back and think about this whole problem, then what we need is
Have we seen something like this in the wild? Yes, this is options parsing!
So any kind of first-order data can be serialized inside Plutus to bytestring
and then deserialized back inside Plutus. The deserializer is determined by the expected type of the term that the bytestring
represents. The expected type is determined by the hash associated with the bytestring
. I.e. it's like the name of an option determines how to read the value of the option.
This is going to be annoying without type classes probably.
And that wouldn't work for higher-order data (like any kind of function), but then we can add the Dynamic
primitive type to the language and the following primitive function:
getTyped : all a. dynamic -> maybe a
which calls our actual type checker being evaluated and either returns the value stored inside the dynamic
or nothing
if the type checker reports an error.
Unfortunately, at the moment we can't implement that, because we do not allow universal quantification in the types of builtin functions. That machinery was written long ago and for a very different purpose, so it needs a redesign.
@michaelpj, what do you think?
Doesn't your last suggestion of adding the dynamic
type solve all the problems in one go? The data scripts can just be dynamic
s, and the validator can getTyped
them to the types it expects, and fail if they don't match up.
I'm not totally sure how to implement dynamic
, though. Internally I guess it's a Term
, and we run the typechecker on it when we pull it out. But how do you create one? Do we have makeDynamic : forall a . a -> dynamic
, does it get passed the AST? We have the substitution problem again - do we discharge the whole environment so it's a closed term?
So we'd need a fair few changes to the builtin machinery, but it seems promising. I hadn't thought about running the typechecker at runtime. That's a nice trick that I guess Haskell can't really use because it's runtime representation is different.
I agree that proper Data.Typeable
looks hard. Reflecting our types in the language itself is a pain, and our types are in some ways even fancier than Haskell's types.
The serialize/deserialize inside Plutus approach makes sense... but how does the validator even get the serialized bytestring in the first place? Remember that data scripts can have any type, so we can't just insist they're always bytestrings. Then you need a way to work out that a data script has type bytestring
... but that's basically just the original problem again!
I also don't think dynamic builtins are the right approach. For one thing, we're expecting them to change more-or-less on a per-chain basis, whereas the types we're talking about here might be different for each transaction. Unless we're willing to assert that all the data scripts on the chain have to have some special builtin type, which I'm not sure we are.
Doesn't your last suggestion of adding the
dynamic
type solve all the problems in one go?
Not quite, because we also need to solve
We have the substitution problem again - do we discharge the whole environment so it's a closed term?
The serialize/deserialize inside Plutus approach makes sense... but how does the validator even get the serialized bytestring in the first place? Remember that data scripts can have any type, so we can't just insist they're always bytestrings. Then you need a way to work out that a data script has type
bytestring
... but that's basically just the original problem again!
Can't we just require a data script to always has type bytestring
? I.e. require to always serialize the contents of a data script. Note that it also solves the environment discharging problem, because you have to actually produce a bytestring
, which is a constant and not an AST.
I also don't think dynamic builtins are the right approach. For one thing, we're expecting them to change more-or-less on a per-chain basis, whereas the types we're talking about here might be different for each transaction. Unless we're willing to assert that all the data scripts on the chain have to have some special builtin type, which I'm not sure we are.
No, that's not what I have in mind. With sufficiently crazy builtins we might be able to hack a way to get reflection/meta-programming for types, once and for all. But I do not think this would be a good approach in any case.
I mean, I think solving the substitution problem by just substituting everything in would be fine, tbh.
Can't we just require a data script to always has type
bytestring
?
I don't think so. If were willing to even require the data scripts in the outputs of a single transaction to all have the same type then things would be easy, since we'd just have a homogeneous list. But that was rejected as too restrictive (and I agree).
I don't think so.
I see. I assumed that would be an option. I'll think more.
I played a bit with PHOAS Typeable
. Here is a tiny Agda model (not exactly parametric, but that is easily fixable):
{-# OPTIONS --type-in-type #-}
infixr 5 _⇒_
infixl 7 _∙_
record Name {K : Set} (a : K) : Set where
constructor name
data Typeable : {K : Set} -> K -> Set₁ where
Var : {K : Set} {a : K} -> Name a -> Typeable a
_⇒_ : {a b : Set} -> Typeable a -> Typeable b -> Typeable (a -> b)
All : {K : Set} {f : K -> Set} -> (∀ {a} -> Name a -> Typeable (f a)) -> Typeable (∀ a -> f a)
Lam : {J K : Set} {f : J -> K} -> (∀ {a} -> Name a -> Typeable (f a)) -> Typeable (λ a -> f a)
_∙_ : {J K : Set} {f : J -> K} {a : J} -> Typeable f -> Typeable a -> Typeable (f a)
idTypeable : Typeable ((A : Set) -> A -> A)
idTypeable = All λ A -> Var A ⇒ Var A
fmapTypeable : Typeable ((F : Set -> Set) (A B : Set) -> (A -> B) -> F A -> F B)
fmapTypeable =
All λ F -> All λ A -> All λ B ->
(Var A ⇒ Var B) ⇒ Var F ∙ Var A ⇒ Var F ∙ Var B
I do not think this is going anywhere, because
One other thing that occurred to me re: dynamic
. If we want to run the typechecker on the term that means it needs to have all the correct types. But that means we need to actually do type substitutions during evaluation, whereas I believe at the moment we just ignore them.
Also dynamic
would preclude us from an implementation via erasure to an untyped calculus.
But that means we need to actually do type substitutions during evaluation, whereas I believe at the moment we just ignore them.
True, but we can put encountered during evaluation types in an environment and supply that environment to the type checker when/if needed.
Also
dynamic
would preclude us from an implementation via erasure to an untyped calculus.
Is it the case that the content of a data script is trustworthy? If so, then in untyped calculus we can just feed a term directly to a function without running into the trouble of making types match. If that's not the case and we might expect one output to have a type A
while it's actually of the type B
, then we can't erase indeed.
Surely we can't implement getTyped : all a. dynamic -> maybe a
in an erased calculus? It has to return Nothing
if it's the wrong type, so we can't just go ahead regardless, the caller might want to do something in the case where it's the wrong type.
It has to return
Nothing
if it's the wrong type, so we can't just go ahead regardless, the caller might want to do something in the case where it's the wrong type.
I was thinking that we can always return Just
, if we trust data scripts. And type checking is only needed to make types match, but not to ensure safety/correctness/etc.
So do we trust data scripts? Or what is the context around this situation?
I don't think it's a good idea to assume that Just
is always the "right" answer. Consider for example how you might use this to implement a backwards-compatible consumer by doing a type test for previous versions of your input type. Then a Nothing
answer might be fine - you just move on to the next test. Whereas returning Just
would be terrible.
Before I learn some blockchain stuff to better understand the context around the problem, what do you think about the following?
Let's say we have this builtin function:
convert : all a b. a -> maybe b
It has this weird behavior: if a
and b
are definitionally the same, then convert
is equal to just
, otherwise it's equal to const nothing
. We can't of course define such a function in Plutus Core, because it checks equality of types at runtime and we do not have a runtime representation of types, so we make convert
a magic builtin. (We probably also do not want to give up on erasure, but this can be solved by erasing all types except those that can't be erased, something like a Proxy
data type which stores a non-erasable type would work probably. Also we can't fully erase types with the other possible solution where we type check terms at runtime)
Having this function, we can make transaction output to be of the following type (I'm ignoring the value restriction problem for now):
list (hash × (all b. maybe b))
(where hash
is an appropriate instantiation of bytesstring
). Here is an example (in pseudosyntax):
txOutput : list (hash × (all b. maybe b))
txOutput =
[ ("aaa", /\b -> convert {integer 1} {b} (1!1) )
, ("bbb", /\b -> convert {unit} {b} unitval)
, ("ccc", /\b -> convert {bool -> bool} {b} not )
]
And then we can do the option parsing thing that I talked above. For example if you need to get the part of the transaction output that has hash "ccc" and type bool -> bool
, you just look up "ccc", get
/\ b -> convert {bool -> bool} {b} not
and instantiate b
by bool -> bool
:
convert {bool -> bool} {bool -> bool} not
which computes to just not
of type maybe (bool -> bool)
.
This way we get homogeneous lists that contain heterogeneous values inside which we can extract by supplying expected types of those values.
It doesn't seem like we can get away without having or faking some runtime representation of types. I believe we can't fully reflect Plutus Core types in Plutus Core itself, so we are left with either magical runtime representations or magical builtins (like the type checking builtin or the convert
builtin), if we want transaction outputs to be heterogeneous. convert
seems the simplest, but I do not understand everything properly yet, so I can miss something or suggest nonsense.
Cute. So we can represent dynamic
as all b. maybe b
and implement it using convert
.
They seem equally powerful: we can implement convert
as fromDynamic . toDynamic
. But convert
seems conceptually simpler.
I also don't think we can get away without some kind of runtime type checking. We already have a runtime representation of types, so we can use that, we just have to keep them around to check them.
I also don't think we can get away without some kind of runtime type checking. We already have a runtime representation of types, so we can use that, we just have to keep them around to check them.
Yes, but the beauty of reify types as terms is that you only keep those around that you really need.
@effectfully Your convert
is Weirich's cast
(but she uses exceptions instead of a Maybe
to indicate failure): https://www.seas.upenn.edu/~sweirich/papers/cast/cast.pdf
@michaelpj,
So we can represent
dynamic
asall b. maybe b
and implement it usingconvert
.They seem equally powerful: we can implement
convert
asfromDynamic . toDynamic
. Butconvert
seems conceptually simpler.
Agreed.
@chak,
@effectfully Your convert is Weirich's cast (but she uses exceptions instead of a Maybe to indicate failure):
I was going to suggest to use exceptions instead of Maybe
. But then we also need to have catch
in Plutus Core which I do think we want to have, because people sure like throwing errors in libraries which other people then use. Without catch
an inappropriate error
can be a deal-breaker. But that's a separate discussion.
Thanks, I'll check that.
There is also Typing Dynamic Typing and A Lightweight Implementation of Generics and Dynamics.
But then we also need to have
catch
in Plutus Core which I do think we want to have, because people sure like throwing errors in libraries which other people then use. Withoutcatch
an inappropriateerror
can be a deal-breaker.
In our context, maybe, raising an error is actually a good plan. When we use these dynamics types for data script validation, we would have to raise an error anyway if the type conversion fails.
(Nevertheless, I do agree, that maybe the primitive should return a Maybe
, but we probably want a wrapper that raises error on Nothing
.)
Big :-1: on throwing an error from me. I can think of at least one nice usecase which that would rule out: using type tests to provide backwards compatibility.
That way you can go "Is this the type I expect? If not, is it the last version of the type?" rather than blowing up immediately on the first one.
That way you can go "Is this the type I expect? If not, is it the last version of the type?" rather than blowing up immediately on the first one.
Good point!
@effectfully Your
convert
is Weirich'scast
(but she uses exceptions instead of aMaybe
to indicate failure): https://www.seas.upenn.edu/~sweirich/papers/cast/cast.pdfThere is also Typing Dynamic Typing and A Lightweight Implementation of Generics and Dynamics.
I studied all of those and I do not feel like they propose sufficiently good solutions in our case. First of all, they really only handle monomorphic types. Dealing with binders and polymorphism is much more complicated. But reflecting all types we have, I think, is just not possible, because we need to quantify over kinds and we do not have such a quantifier.
The convert
thing that I suggest (let's call it cast
indeed, though, I think coerce
is more appropriate, but it's already reserved by Data.Coerce
) handles all types, regardless of whether there is polymorphism or higher kinds. And it's really a pretty minimal thing: just one tiny primitive (that requires a rather intricate "typed builtins" machinery, but we need it for other things as well).
But reflecting all types we have, I think, is just not possible, because we need to quantify over kinds and we do not have such a quantifier.
Do we need to be able to reflect all types? (The Haskell solutions also all have restrictions, but are still useful in practice.)
Do we need to be able to reflect all types?
This is equal to "do we need to be able to put arbitrary types into data scripts?". I don't know the answer, but if we can easily avoid limitations, why not do that?
Yes, I'm also very in favour of avoiding arbitrary limitations. Limitations in the compilation target lead to weird errors in at the surface level, which I'd really rather avoid.
Closing because we now have monotyped Data
: see #1436.
Add serialization builtin to Plutus, i.e.