IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 476 forks source link

Could untyped plutus core have primitives which don't need initial force #4183

Closed phadej closed 1 year ago

phadej commented 3 years ago

by looking at some contracts' plutus core, there are some of (force mkCons), (force trace), (force ifThenElse) for example. I guess in these forcings are cheap to store and execute, but they aren't entirely free.

EDIT: I think I haven't seen these combinators used unforced.

michaelpj commented 3 years ago

These are somewhat artificial. Type abstractions in typed PLC block evaluation, so we erase abstractions to delay and instantiations to force. Builtins can have type arguments, so the corresponding type instantiations get turned into forces. We thought the simplest and most consistent approach was just to maintain these, although strictly speaking they're redundant.

phadej commented 3 years ago

Checking some more, E.g. sndPair has always two forces (for both of pair types), chooseList has also two, If each force is a byte, there's plenty of bytes we could save.

michaelpj commented 3 years ago

The primitives don't appear that much in the grand scheme of things. If we were concerned about this then the number of force/delays introduced from erasing type abstractions/instantiations throughout the whole program is going to dwarf that. But I don't know of a good way to get rid of them en masse (we do the obvious force/delay cancellation already).

(Also term tags are actually only 4 bits, not a whole byte.)

phadej commented 3 years ago

I run a quick test (rewrite Force b@Builtin{} -> b) on use-cases Uniswap contract:

        expected: (11462,11001)
         but got: (11011,10777)

The counts are (node size, serialized size of term using DeBruijn names), that's 4% reduction in node count and 2% serialized size. I think that is significant in the grand scheme of things.

michaelpj commented 3 years ago

Thanks, that's a useful datapoint!

Let me elaborate a bit more on why this is tricky.

Consider the following program (in pseudo-PIR):

let usesTrace :: (forall a. BuiltinString -> a -> a) -> Integer -> Integer
     usesTrace tracer i = tracer "I saw it" i
     evilTrace :: forall a . BuiltinString -> a -> a
     evilTrace s v = error "I'm evil"

     i1 = usesTrace trace 0
     i2 = usesTrace evilTrace 1
in i1 + i2

The usesTrace function receives as its argument a polymorphic function which it instantiates. Builtin functions are just values, so we can pass a polymorphic builtin function as its argument. Therefore whatever usesTrace does has to work for both "normal" functions and builtins. The first thing that it will do is to instantiate the function, which will correspond to a force when erased.

So at the very least it must be tolerable to force builtin functions. At the moment we require you to force exactly where you would otherwise pass a type. Perhaps we could not require you to do any forcing. The only problematic case would be a nullary builtin which had only type arguments and immediately failed (e.g. error :: forall a . a as a builtin) - it's not clear what that would mean. We could just ban them.

I have a feeling we've been through this discussion before, @kwxm do you remember anything else?

phadej commented 3 years ago

I don't agree, it is not tricky:

You can do rewrite in two steps:

  1. Replace each builtin with a version which is delayed as many times as there are erased type arguments
  2. Apply forceDelay simplification

Let me write (builtin b) for polymorphic builtin, and (builtin! b) for one which is absorbed force arguments (i.e. cannot be forced).

Your example would be rewritten as:

usesTrace (builtin trace) 0 ->          -- replace with delayed versions
usesTrace (delay (builtin! trace)) 0    -- note: builtin!

And indeed, in the compiled PLC there is

[ [ usesTrace (builtin trace) ] (con integer 0) ]

That grows in size, but that doesn't actually happen. GHC rarely passes polymorphic arguments (there should be an explicit rank-N signature, no eta-expansion).

If we rewrite the example as (which is closer to what inferred types would be):

errorExample2 :: Integer
errorExample2 = 
    let usesTrace :: (BuiltinString -> Integer -> Integer) -> Integer -> Integer
        usesTrace tracer i = tracer "I saw it" i

        evilTrace :: forall a . BuiltinString -> a -> a
        evilTrace s v = traceError "I'm evil"

        i1 = usesTrace trace 0
        i2 = usesTrace evilTrace 1
    in i1 + i2

The line above becomes

[ [ usesTrace (force (builtin trace)) ] (con integer 0) ]

which would simplify:

usesTrace (force (builtin trace)) 0 ->           -- replace with delayed versions
usesTrace (force (delay (builtin! trace))) 0 ->  -- forceDelay
usesTrace (builtin! trace) 0                     -- profit!

The polymorphic program has size 54, simple one 53, and latter could become smaller.


EDIT: What does (force (builtin trace)) evaluate to? Or is it stuck until there is enough arguments to execute the builtin?

michaelpj commented 3 years ago

That grows in size, but that doesn't actually happen. GHC rarely passes polymorphic arguments (have to an explicit rank-N signature).

Yes, you're certainly right about that, but even if it's a corner case we thought it was important for builtins to behave exactly the same as polymorphic functions. That way you don't have to worry about the corner cases.

You can do rewrite in two steps...

Nice idea. This assumes that all the type arguments come first, but at the moment our builtins can have interleaved term and type arguments. This is pretty pointless however, and we might as well require the type arguments to come first.

If we did that and banned nullary builtins, I think this could work.

What does (force (builtin trace)) evaluate to? Or is it stuck until there is enough arguments to execute the builtin?

It's stuck.

phadej commented 3 years ago

This assumes that all the type arguments come first, but at the moment our builtins can have interleaved term and type arguments. This is pretty pointless however, and we might as well require the type arguments to come first.

I think that removing the head type-arguments is already an improvement. If it's possible to shuffle the type arguments to the front, that would be even better, but do you have such builtins? (The specification doesn't list their types, nor haddocks, list their types, that would be helpful).

phadej commented 3 years ago

It looks like that types of builtins are generated by toBuiltinMeaning?

I think GHC pushes all foralls to the front in inferred types. That trick is cool but feels dangerous, I'd prefer having types of builtins checked (against explicitly written ones) rather than inferred from their Haskell-meaning.

EDIT: if (U)PLC is a compilation target the types or "call convention" (in case of UPLC) is an important information to have in the spec.

phadej commented 3 years ago

What does (force (builtin trace)) evaluate to? Or is it stuck until there is enough arguments to execute the builtin?

It's stuck.

This is unfortunate. I liked the "builtins have to fully saturated" design more (when there isn't such problem). It's easier to show that semantics are preserved then. (My proposal would been completely trivial then).

michaelpj commented 3 years ago

We had a long argument about saturated vs unsaturated builtins. In the end we opted for unsaturated - perhaps a mistake, but here we are.

kk-hainq commented 3 years ago

You can do rewrite in two steps...

Nice idea. This assumes that all the type arguments come first, but at the moment our builtins can have interleaved term and type arguments. This is pretty pointless however, and we might as well require the type arguments to come first.

If we did that and banned nullary builtins, I think this could work.

We've been able to gather some interest on this specific topic and Oleg's proposal. @michaelpj @phadej Should we add it to #4174 and attempt it in the upcoming weeks?

phadej commented 3 years ago

This is an intrusive proposal as it changes the UPLC. Everything listed in #4174 doesn't seem to require changes, even in typed PLC, as they can happen on PIR-level, which makes them a lot more realistic to get done :)

I opened this issue "early" because I expect it take long to fix, if it will ever.

kk-hainq commented 3 years ago

I know, but it is well-scoped and elegant. We also suspect the improvements would be non-trivial, and our transpilers would take fewer forces every day of the week!

michaelpj commented 3 years ago

Yes, this would be a change to the semantics of the language, and might well require a new language version and changes to the spec (which we're in the process of redoing already). It's a much more complicated change, and I don't think it will be that significant in the grand scheme of things. It's something we can hopefully roll in easily enough when we do do another version.

kk-hainq commented 3 years ago

Okay, I'll just add this to #4174 for tracking for now.

effectfully commented 2 years ago

@phadej

If it's possible to shuffle the type arguments to the front, that would be even better, but do you have such builtins?

We don't.

(The specification doesn't list their types, nor haddocks, list their types, that would be helpful).

See the golden files. Generating a haddock comment using doctest or whatever would be quite helpful, I agree.

It looks like that types of builtins are generated by toBuiltinMeaning?

Within toBuiltinMeaning, yes.

I think GHC pushes all foralls to the front in inferred types. That trick is cool but feels dangerous, I'd prefer having types of builtins checked (against explicitly written ones) rather than inferred from their Haskell-meaning.

We don't rely on GHC pushing the foralls to the front (nor would it be possible at all, I think). Using an inference machinery and not checking what it infers would be very dangerous, I agree, but we have golden tests for that purpose. A golden file is created automatically whenever a new builtin is added, so we do control what PLC types get inferred for builtins.

EDIT: if (U)PLC is a compilation target the types or "call convention" (in case of UPLC) is an important information to have in the spec.

kwxm commented 2 years ago

(The specification doesn't list their types, nor haddocks, list their types, that would be helpful).

Sorry, I just noticed this after it was quoted in a previous comment.

The plc and uplc commands now have a print-builtin-signatures option that will print out the types of all the builtins it knows about in a reasonably readable form:

$ cabal exec uplc print-builtin-signatures
addInteger               : [ integer, integer ] -> integer
subtractInteger          : [ integer, integer ] -> integer
...
encodeUtf8               : [ string ] -> bytestring
decodeUtf8               : [ bytestring ] -> string
ifThenElse               : [ forall a, bool, a, a ] -> a
chooseUnit               : [ forall a, unit, a ] -> a
trace                    : [ forall a, string, a ] -> a
fstPair                  : [ forall a, forall b, pair(a, b) ] -> a
sndPair                  : [ forall a, forall b, pair(a, b) ] -> b
chooseList               : [ forall a, forall b, list(a), b, b ] -> b
mkCons                   : [ forall a, a, list(a) ] -> list(a)
...

That still doesn't quite tell the full story because sometimes forall means "for all built-in types" (as in mkCons) and sometimes it means "for any term at all" (as in ifThenElse). We're working on a new version of the specification which will explain all this in detail: that should be available in a couple of weeks.

phadej commented 2 years ago

@kwxm that's great.

effectfully commented 1 year ago

There's now a CIP for this discussion, hence I'm closing the ticket. Feel free to reopen if you disagree.