IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.56k stars 479 forks source link

Strip out unused constructors in PIR #4148

Open michaelpj opened 2 years ago

michaelpj commented 2 years ago

We can do whole-program analysis, so we can actually remove constructors from datatypes that are unused.


Example (in pseudo-PIR):

let datatype (Maybe :: Type -> Type) [a :: Type] match_Maybe [Just :: a -> Maybe a, Nothing :: Maybe a]
     f = \(x :: Maybe Integer) -> match_Maybe x (\(y :: Integer) -> y) 0
in f (Just 1)

Since the Nothing constructor is never used, we can get rid of it! We then have to get rid of the corresponding arguments to match_Maybe everywhere too:

let datatype (Maybe :: Type -> Type) [a :: Type] match_Maybe [Just :: a -> Maybe a]
     f = \(x :: Maybe Integer) -> match_Maybe x (\(y :: Integer) -> y)
in f (Just 1)

This should mostly work with the dead-code analysis as is. Currently it over-approximates the dependencies between parts of a datatype (we assume they all depend on each other, which isn't true). If we fix that, we should naturally find out which constructors are dead.

Removing them would be more complicated because of the need to also remove the case branches throughout the rest of the program.

michaelpj commented 2 years ago

We can do whole-program analysis

Note that this is only true if we really do have the whole program, which means the user must not apply the optimized program to something else. In particular, they must not use the common pattern of making an parameterized validator by compiling a function and applying it to a separately-constructed argument, e.g.

$$(complile [|| \(x :: Maybe Int) -> case x of {Just x -> x; Nothing -> 0} ||]) `applyCode` Lift.liftCode (Just 1)

will go badly if we strip out the constructors of Maybe in the function!

Probably we can keep this optimization but gate it behind a flag, so users who aren't doing this can opt-in.

michaelpj commented 2 years ago

I have only one idea for making this a bit safer in the future, which is to make applyCode apply pre-optimized PIR programs, not erased PLC programs.

This would mean we'd need to do the compilation to UPLC later, at runtime. This isn't ideal: it makes things slower, and it makes it harder to save the final UPLC. It'll also just complicate CompiledCode quite a bit: at the moment it's really just UPLC with the PIR as an artifact, but instead it would have to do delayed compilation etc, which is quite a lot more complicated.

phadej commented 2 years ago

will go badly if we strip out the constructors of Maybe in the function!

Not true. the UPLC of Lift.liftCode (Just 1) is

[
  [
    [
      (lam
        GHC_Maybe_Just
        (lam
          GHC_Maybe_Nothing
          (lam match_GHC_Maybe_Maybe [ (force GHC_Maybe_Just) (con integer 1) ])
        )
      )
      (delay
        (lam
          arg_0
          (delay
            (lam
              case_GHC_Maybe_Just
              (lam case_GHC_Maybe_Nothing [ case_GHC_Maybe_Just arg_0 ])
            )
          )
        )
      )
    ]
    (delay
      (delay
        (lam case_GHC_Maybe_Just (lam case_GHC_Maybe_Nothing case_GHC_Maybe_Nothing))
      )
    )
  ]
  (delay (lam x x))
]

i.e. constructors and eliminator are all there. That makes parametrised contracts bigger, as the definitions are repeated. But easy to strip off.


I have only one idea for making this a bit safer in the future, which is to make applyCode apply pre-optimized PIR programs, not erased PLC programs.

please don't do that. it would imply that UPLC cannot be used as a compliation target.

michaelpj commented 2 years ago

i.e. constructors and eliminator are all there. That makes parametrised contracts bigger, as the definitions are repeated. But easy to strip off.

The problem is if we apply two program together f a. a will have the constructor for Just, f will not. Because of how datatypes are Scott-encoded, this will fail because the values will have the wrong shape.

michaelpj commented 2 years ago

please don't do that. it would imply that UPLC cannot be used as a compliation target.

I also don't think this is true. applyCode and friends only exist in the Haskell library as conveniences. It doesn't stop people doing other things just fine. This wouldn't affect Cardano or anything.

phadej commented 2 years ago

applyCode and friends only exist in the Haskell library as conveniences.

I see. It's defined in plutus-tx. How nodes apply the datum,redeemer&state to the script to validate transactions?

To answer myself: https://github.com/input-output-hk/plutus/blob/90d76651aea0716e11446c8ac23cd0803c0f0d3a/plutus-ledger-api/src/Plutus/V1/Ledger/Scripts.hs#L182-L186

Good, I'm not worried then.

kk-hainq commented 2 years ago

I have only one idea for making this a bit safer in the future, which is to make applyCode apply pre-optimized PIR programs, not erased PLC programs.

Wouldn't this require storing PIR programs on-chain? I would love to do most optimizations at the PIR level, but this seems to go against the desired goal of smaller on-chain scripts.

In case #458 dies horribly or becomes very hard to navigate, I'll port my comment there here:

I think we will have to go with an "incorrect" dependency to be useful for Cardano validator scripts. A "correct" one would just nuke validator scripts as unused functions. Making deconstructors depend on constructors was very dumb (I now think deconstructors do not need to depend on anything?), but there may still be helpful rules. For example, if a program reduces to a function, all of its type arguments (and their type dependencies?) must stay intact, while other types in the program can still be truncated. Another ambitious effort might be truncating input types/constructors (like ScriptContext) only to keep fields used by the validator script. A node would then have to analyze each script to truncate the input values correspondingly (If that even makes sense...).

michaelpj commented 2 years ago

Wouldn't this require storing PIR programs on-chain? I would love to do most optimizations at the PIR level, but this seems to go against the desired goal of smaller on-chain scripts.

I think you have the same confusion as Oleg. I'm not suggesting that we do this on-chain, applyCode is only in the Haskell library (https://github.com/input-output-hk/plutus/issues/4148#issuecomment-954670257).

kk-hainq commented 2 years ago

@michaelpj Sorry it was yet another late-night drunk dumb comment. I literally worked on lifting and benchmarked applyCodeed example contracts on this task and somehow still misunderstood it...

In retrospect, that idea with your comment on the PR should be enough for me to continue this work. It does make perfect sense after all.

A validator

myValidator :: MyDatum -> MyRedeemer -> ScriptContext -> Bool
myValidator = ...

is actually wrapped up with conversions from Data, so in fact it's something like:

myValidator' :: Data -> Data -> Data -> ()
myValidator' d r c = myValidator (unsafeFromData d) (unsafeFromData r) (unsafeFromData c)

So we do have references to all the pieces of the types in question due to the IsData instances. It's only Data that looks unused - but it's a builtin type.

So I think we could get away with it, although it's still a bit questionable.

I'll do the saner directions first, then hopefully revisit this in the future.

rjmh commented 2 years ago

We can do whole-program analysis

Note that this is only true if we really do have the whole program, which means the user must not apply the optimized program to something else. In particular, they must not use the common pattern of making an parameterized validator by compiling a function and applying it to a separately-constructed argument, e.g.

$$(complile [|| \(x :: Maybe Int) -> case x of {Just x -> x; Nothing -> 0} ||]) `applyCode` Lift.liftCode (Just 1)

will go badly if we strip out the constructors of Maybe in the function!

But this case is easy to recognise: Maybe occurs in the type of the term being compiled, and so its representation cannot be changed without changing the meaning of the compiled code. You should be able to strip unused constructors from other types without a problem.

This would mean there might be dead constructors in the result of applyCode, but I don't think that is really a problem.

michaelpj commented 2 years ago

But this case is easy to recognise: Maybe occurs in the type of the term being compiled, and so its representation cannot be changed without changing the meaning of the compiled code. You should be able to strip unused constructors from other types without a problem.

Interesting idea. Obviously it would have to be all types that are transitively used by the type of the final term, but that would make things better indeed.

kk-hainq commented 2 years ago

But this case is easy to recognise: Maybe occurs in the type of the term being compiled, and so its representation cannot be changed without changing the meaning of the compiled code. You should be able to strip unused constructors from other types without a problem.

To safely modify types for cross-program application the types must be unused, or only used at the type level. In such a case, we can replace then eventually remove it per #4147, instead of just finding unused constructors to remove!

effectfully commented 1 year ago

@michaelpj what's the status of this issue?

Since it's a non-critically-important optimization, for the time being I'm marking it with "Low priority".

michaelpj commented 1 year ago

Yeah, I think that's right. It's also a bit tricky.