IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.56k stars 477 forks source link

Why DefaultUni is so complicated? #4781

Open phadej opened 2 years ago

phadej commented 2 years ago

Why cannot pair and list be represented as

DefaultUniList :: DefaultUni a -> DefaultUni [a]
DefaultUniPair :: DefaultUni a -> DefaultUni b -> DefaultUni (a, b)

Is DefaultUni (Esc a) used anywhere (where?) at any other kind than Type? In ValueOf it is at type Type, and above simplification would be massive.

EDIT: The problem is that I have no idea how to exhaustively match on DefaultUni. Indeed the plutus-core itself has (somewhat ugly):

    bring _ (f `DefaultUniApply` _ `DefaultUniApply` _ `DefaultUniApply` _) _ =
        noMoreTypeFunctions f
michaelpj commented 2 years ago

cc @effectfully

effectfully commented 2 years ago

Why cannot pair and list be represented as

See this Note for how we arrived at the current representation.

Is DefaultUni (Esc a) used anywhere (where?) at any other kind than Type?

Yes, in the Plutus types. The Note has an abstract example.

EDIT: The problem is that I have no idea how to exhaustively match on DefaultUni. Indeed the plutus-core itself has (somewhat ugly):

Upvote this proposal :)

phadej commented 2 years ago

Ok.

I still think the design is too clever for too little (I don't really believe in any non DefaultUni instantiations), but this is not my call.

effectfully commented 2 years ago

I don't really believe in any non DefaultUni instantiations

We were told that Plutus was going to be used on other blockchains. But even if that never happens, we still need DefaultUni as a separate entity. Consider a function like sum :: [Integer] -> Integer: in order to apply it, you need a way of extracting a [Integer] from the AST. You could have

data Constant
    = ConstantInteger Integer
    | ConstantBool Bool
    | ConstantList [Constant]

in the AST but then extracting a [Integer] from that is an O(n) operation in the length of the list, plus you practically allow heterogeneous lists, which is inappropriate for a reasonably typed language. You could do

data Constant
    = ConstantInteger Integer
    | ConstantBool Bool
    | ConstantListInteger [Integer]
    | ConstantListBool [Bool]

but that's just bad language design imo, especially when there are more types, both monomorphic and polymorphic. Like imagine having ConstantListPairDataData [(Data, Data)] (we use that currently) as one of the many constructors -- pure abomination.

So instead you'd do

data Constant
    = ConstantInteger Integer
    | ConstantBool Bool
    | forall a. ConstantList (DefaultUni (Esc a)) [a]

which is compact and allows for O(1) extraction (in the length of the list, not the depth of the type tag) of [Integer] or [Bool] or [[[Integer]]] or whatever.

So DefaultUni makes sense even without customizable universes.

And when it comes to your version of DefaultUni vs the current one, we've discussed that to death internally. We want polymorphic built-in functions and hence we need a way to apply a built-in type to a target language type variable. Allowing target language type variables to appear in the universe of built-in types is a nightmare to work with as explained in the Note that I referenced earlier, while our current solution only makes you worry about built-in types when you directly do something with them (the only exception is type normalization as far as I remember).

One thing we could do differently is separating the type-level universe from the term-level one. For the type level we care about partial applications and don't care about intrinsic typing, while for the term level it's the opposite and the current alignment just encompasses both the requirements in a single data type. We could separate the data type into two different ones, but we didn't try that and the current alignment works well anyway for us to bother duplicating the universe.

As for the noMoreTypeFunctions nonsense, we have two calls to it in the whole codebase. Plus it only exists, because pattern synonyms are not as good as they could be. Otherwise you would be able to directly and exhaustively match on DefaultUniList and DefaultUniPair without ever needing to reference DefaultUniApply when dealing with term-level stuff.

phadej commented 2 years ago
data Constant
    = ConstantInteger Integer
    | ConstantBool Bool
    | ConstantList [Constant]

no. Use

data Constant where
    MkConstant :: IsConstantType a -> a -> Constant

 data IsConstantType a where
    IsUnit       :: IsConstantType ()
    IsInteger    :: IsConstantType Integer
    IsByteString :: IsConstantType ByteString
    IsText       :: IsConstantType Text
    IsBool       :: IsConstantType Bool
    IsData       :: IsConstantType Data
    IsList       :: IsConstantType a -> IsConstantType [a]
    IsPair       :: IsConstantType a -> IsConstantType b -> IsConstantType (a, b)

we need a way to apply a built-in type to a target language type variable

I'm not sure what you exactly mean by that, by AFAIU, it could be witnessed by IsConstantType a -> IsCosntantType (f a) value for built-in type (constructor) f. Note, f is not really polymorphic over all Haskell types, only IsConstantTypes, so it is not a restriction.

effectfully commented 2 years ago

I wasn't sure if you suggested that we could get rid of DefaultUni or merely hardcode it, so I commented on both. Your IsConstantType approach is of the latter kind and that is addressed by the Note and this part of me comment:

And when it comes to your version of DefaultUni vs the current one, we've discussed that to death internally. We want polymorphic built-in functions and hence we need a way to apply a built-in type to a target language type variable. Allowing target language type variables to appear in the universe of built-in types is a nightmare to work with as explained in the Note that I referenced earlier, while our current solution only makes you worry about built-in types when you directly do something with them (the only exception is type normalization as far as I remember).

One thing we could do differently is separating the type-level universe from the term-level one. For the type level we care about partial applications and don't care about intrinsic typing, while for the term level it's the opposite and the current alignment just encompasses both the requirements in a single data type. We could separate the data type into two different ones, but we didn't try that and the current alignment works well anyway for us to bother duplicating the universe.

So for Untyped Plutus Core your approach would work perfectly, but we have Typed Plutus Core where we need to be able to reference a built-in type in the types AST and we need to be able to apply a built-in type to an arbitrary Plutus type (not Haskell type) to give a type to a built-in Plutus function like

headPlc : all a. list a -> a

Note that here list is a built-in type that gets applied to a Plutus type variable. It's not possible to sneak a Plutus type variable into your IsConstantType, let alone an arbitrary Plutus type (supporting type variables alone would make it impossible to implement type substitution and so would be nonsensical).

We could have your IsConstantType at the term-level and mirror it at the type level in a way that allows us to express what we want there, but duplicating the universe is kinda unfortunate and we never tried it.

Also

data Constant
    = ConstantInteger Integer
    | ConstantBool Bool
    | forall a. ConstantList (DefaultUni (Esc a)) [a]

is better, because you don't want to waste space storing a type tag for Integer, Bool etc and you don't want to waste time checking those for equality (which due to the presence of recursive types is a recursive operation and so is non-trivial to inline and optimize). We've actually tried this representation, but it gave us negligible speedup, so we decided it wasn't worth it. We might try it again now that we have a way better alignment w.r.t. inlining.

Another option would be to have one universe for TPLC and one universe for UPLC (as opposed to one universe for the type level and one universe for the term level). However that would only make things more complicated, because we'd have to keep the current fancy universe for TPLC and also add another simpler one for UPLC.

tl;dr: it's Typed Plutus Core that is making things complicated. If we only had Untyped Plutus Core, we'd use a simpler universe similar to yours.

phadej commented 2 years ago

but we have Typed Plutus Core

TBH, I don't care about typed plutus core. The code put on the blockchain is untyped, and I only care about types in representation of untyped one.

IMO, untyped plutus core should be in separate package to begin with, as typed plutus core as well as PIR are an implementation detail to get UPLC from Haskell, and if these IRs need something complicated, let that complexity be hidden in their implementation and not leak to UPLC.

effectfully commented 2 years ago

I've raised these questions internally, thank you for the input.

My personal view (does not reflect the view of the company) is that it's a matter of priority. I can spend my time solving plentiful tasks whose objectives are clear to me or I can spend my time duplicating universes and writing boilerplate that increases maintenance costs for some 3rd party to write fewer DefaultUniApplys. When a 3rd party creates a PR providing us with useful functionality (like e.g. this one), I'm happy to spend as much time there as needed to properly adopt the code. When it's just "hey, we want you to simplify those internals for us", I don't really care right now, if it's a non-trivial amount of effort and especially if it increases maintenance costs for the whole project.

michaelpj commented 2 years ago

IMO, untyped plutus core should be in separate package to begin with, as typed plutus core as well as PIR are an implementation detail to get UPLC from Haskell, and if these IRs need something complicated, let that complexity be hidden in their implementation and not leak to UPLC.

I would like to move in the direction of separating out UPLC, but it's a large and painful refactoring that is never top priority.

effectfully commented 2 years ago

My previous comment was mostly about changing abstractions being rather expensive for us. Refactoring the module structure is not a big deal and I've been doing that routinely. However:

I would like to move in the direction of separating out UPLC, but it's a large and painful refactoring that is never top priority.

this increases development costs for us. We need all of the builtins machinery for TPLC, including evaluation bits (because we have the CK machine), so it has to stay there (or somewhere upstream). But we need to have access to, say, CekValue to have proper inlining of UPLC builtins evaluation bits and so tweaking builtins and then looking at how that affected the generated Core will require crossing package boundaries, which tends to be painfully slow (and was the specific reason why we ended up moving mkMachineParameters to plutus-core). Given that there are ideas on changing the builtins machinery in major ways, it's probably not the best time to increase the costs of doing that.

michaelpj commented 2 years ago

I think TPLC would depend on UPLC, so most of the common stuff would exist in the UPLC package/component anyway.

effectfully commented 2 years ago

There's now a plausible proposal on how to fix the problem discussed in this thread (tracked in our (private) Jira as PLT-810), so I'm reopening this issue.

effectfully commented 1 year ago

This is being worked on. The action that is needed from the team is figuring out if we want to split the universe into a type-level one and a term-level one in order for DefaultUni to get simplified as proposed in the original message. Here's a branch with experiments.

effectfully commented 1 year ago

A month later, there was a bit of progress on this particular issue, but not much. We're working towards a better builtins machinery in general and the scope of this work is huge, so it may take a lot of time for us to get to actually fixing this issue. It's probably not a huge deal, since the issue is relatively low priority (not a bug, not something a regular user would have to workaround etc), just wanted to give you an update.

effectfully commented 1 year ago

Progress report: we've basically settled on separating the type-level universe from the term-level one and I've already started working on making it a reality, but I'm constantly getting distracted by other stuff and it'll take a while for this issue to get resolved.

effectfully commented 10 months ago

Very little tangible progress has been made on this one in the past several months, unfortunately.