IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 479 forks source link

"Empty" validator generates 3KB serialized contracts. #3702

Closed KtorZ closed 2 years ago

KtorZ commented 3 years ago

Area

[x] Plutus Foundation Related to the GHC plugin, Haskell-to-Plutus compiler, on-chain code [] Plutus Application Framework Related to the Plutus application backend (PAB), emulator, Plutus libraries [] Marlowe Related to Marlowe [] Other Any other topic (Playgrounds, etc.)

Summary

Empty validators generates serialized scripts of 3KB filled with unused primitives. Perhaps expected at this stage?

Steps to reproduce

Out of curiosity and to better understand where some errors in another contract could com from, I checked the size of an empty validator:

data Initial

instance Scripts.ValidatorTypes Initial where
  type DatumType Initial = ()
  type RedeemerType Initial = ()

validator :: () -> () -> ScriptContext -> Bool
validator _ _ _ = True

compiledValidator :: CompiledCode (ValidatorType Initial)
compiledValidator = $$(compile [||validator||])

The resulting compiled version is ~3KB and looking at the pretty-printed compiled code (see below), it seems mostly filled with primitives and base constructions for the ScriptContext. Even if unused.

see compiled code ``` (program (let (nonrec) (datatypebind (datatype (tyvardecl Unit (type)) Unit_match (vardecl Unit Unit)) ) (datatypebind (datatype (tyvardecl Credential (type)) Credential_match (vardecl PubKeyCredential (fun (con bytestring) Credential)) (vardecl ScriptCredential (fun (con bytestring) Credential)) ) ) (datatypebind (datatype (tyvardecl StakingCredential (type)) StakingCredential_match (vardecl StakingHash (fun Credential StakingCredential)) (vardecl StakingPtr (fun (con integer) (fun (con integer) (fun (con integer) StakingCredential))) ) ) ) (datatypebind (datatype (tyvardecl DCert (type)) DCert_match (vardecl DCertDelegDeRegKey (fun StakingCredential DCert)) (vardecl DCertDelegDelegate (fun StakingCredential (fun (con bytestring) DCert)) ) (vardecl DCertDelegRegKey (fun StakingCredential DCert)) (vardecl DCertGenesis DCert) (vardecl DCertMir DCert) (vardecl DCertPoolRegister (fun (con bytestring) (fun (con bytestring) DCert)) ) (vardecl DCertPoolRetire (fun (con bytestring) (fun (con integer) DCert)) ) ) ) (datatypebind (datatype (tyvardecl TxOutRef (type)) TxOutRef_match (vardecl TxOutRef (fun (con bytestring) (fun (con integer) TxOutRef))) ) ) (datatypebind (datatype (tyvardecl ScriptPurpose (type)) ScriptPurpose_match (vardecl Certifying (fun DCert ScriptPurpose)) (vardecl Minting (fun (con bytestring) ScriptPurpose)) (vardecl Rewarding (fun StakingCredential ScriptPurpose)) (vardecl Spending (fun TxOutRef ScriptPurpose)) ) ) (datatypebind (datatype (tyvardecl Bool (type)) Bool_match (vardecl True Bool) (vardecl False Bool) ) ) (datatypebind (datatype (tyvardecl Extended (fun (type) (type))) (tyvardecl a (type)) Extended_match (vardecl Finite (fun a [Extended a])) (vardecl NegInf [Extended a]) (vardecl PosInf [Extended a]) ) ) (datatypebind (datatype (tyvardecl LowerBound (fun (type) (type))) (tyvardecl a (type)) LowerBound_match (vardecl LowerBound (fun [Extended a] (fun Bool [LowerBound a]))) ) ) (datatypebind (datatype (tyvardecl UpperBound (fun (type) (type))) (tyvardecl a (type)) UpperBound_match (vardecl UpperBound (fun [Extended a] (fun Bool [UpperBound a]))) ) ) (datatypebind (datatype (tyvardecl Interval (fun (type) (type))) (tyvardecl a (type)) Interval_match (vardecl Interval (fun [LowerBound a] (fun [UpperBound a] [Interval a])) ) ) ) (datatypebind (datatype (tyvardecl Maybe (fun (type) (type))) (tyvardecl a (type)) Maybe_match (vardecl Just (fun a [Maybe a])) (vardecl Nothing [Maybe a]) ) ) (datatypebind (datatype (tyvardecl Address (type)) Address_match (vardecl Address (fun Credential (fun [Maybe StakingCredential] Address)) ) ) ) (datatypebind (datatype (tyvardecl Tuple2 (fun (type) (fun (type) (type)))) (tyvardecl a (type)) (tyvardecl b (type)) Tuple2_match (vardecl Tuple2 (fun a (fun b [[Tuple2 a] b]))) ) ) (let (rec) (datatypebind (datatype (tyvardecl List (fun (type) (type))) (tyvardecl a (type)) Nil_match (vardecl Nil [List a]) (vardecl Cons (fun a (fun [List a] [List a]))) ) ) (let (nonrec) (datatypebind (datatype (tyvardecl TxOut (type)) TxOut_match (vardecl TxOut (fun Address (fun [[(lam k (type) (lam v (type) [List [[Tuple2 k] v]])) (con bytestring)] [[(lam k (type) (lam v (type) [List [[Tuple2 k] v]])) (con bytestring)] (con integer)]] (fun [Maybe (con bytestring)] TxOut))) ) ) ) (datatypebind (datatype (tyvardecl TxInInfo (type)) TxInInfo_match (vardecl TxInInfo (fun TxOutRef (fun TxOut TxInInfo))) ) ) (datatypebind (datatype (tyvardecl TxInfo (type)) TxInfo_match (vardecl TxInfo (fun [List TxInInfo] (fun [List TxOut] (fun [[(lam k (type) (lam v (type) [List [[Tuple2 k] v]])) (con bytestring)] [[(lam k (type) (lam v (type) [List [[Tuple2 k] v]])) (con bytestring)] (con integer)]] (fun [[(lam k (type) (lam v (type) [List [[Tuple2 k] v]])) (con bytestring)] [[(lam k (type) (lam v (type) [List [[Tuple2 k] v]])) (con bytestring)] (con integer)]] (fun [List DCert] (fun [List [[Tuple2 StakingCredential] (con integer)]] (fun [Interval (con integer)] (fun [List (con bytestring)] (fun [List [[Tuple2 (con bytestring)] (con data)]] (fun (con bytestring) TxInfo)))))))))) ) ) ) (datatypebind (datatype (tyvardecl ScriptContext (type)) ScriptContext_match (vardecl ScriptContext (fun TxInfo (fun ScriptPurpose ScriptContext)) ) ) ) (termbind (strict) (vardecl validator (fun Unit (fun Unit (fun ScriptContext Bool)))) (lam ds Unit (lam ds Unit (lam ds ScriptContext True))) ) validator ) ) ) ) ```

As a side note, what I find also interesting is that the pretty-printed code is ~5KB, which isn't that much bigger than the serialized code. Intuitively, I'd expect the serialized code to be at least an order of magnitude smaller :thinking:. Plus, doing 'dummy' transformations on the pretty-printed code like replacing a few of the keywords by some shorter identifier (e.g. vardecl -> v, datatypebind -> dtb, fun -> f ...) rapidly brings down the size of the pretty-printed code to 3KB and less, which suggests that something is wrong with the serialized code?

Expected behavior

System info (please complete the following information):

Screenshots and attachments

Additional context

Add any other context about the problem here.

KtorZ commented 3 years ago

Small addition, here's the serialised contract (base16-encoded):

590c89010000332332232323332223332223333333322222222332233333222223333222233322233223322332233322233223322333222332233223322332232323232323232323232323232323232323232323232323232335500104e04e112222223007333004300600330050023008001233353031001205123504f3530503357606e612f5928768b2d8ed720b4586e720b3586d36cb2d900a29302811919191919191999ab95337126aae7cc0144800520002300212001230071200105523301f30021200130031200123007357446006240024602c6ae88d5d198010900091aaba03002120012375200646a09c931191919191919191919191919191919191919191919191999ab95337126aae7cc0544800520002300212001230171200106423333333333030300212001300312001300412001300512001300612001300712001300812001300912001300a12001300b12001233502301735744602624002466a04402e6ae88c044480048cd4084c06448004d5d1180789000919a810180c090009aba2300d12001233501f02335744601624002466a03c66aa03e046eb0d5d1180489000919a80c3ac35744600e24002466a038eb4d5d1180289000919a80d99aa80e3ad01d35744600624002460446ae88d5d198010900091aba330021200123574660042400246ae8cc008480048d5d198010900091aba330021200123574660042400246ae8cc008480048d5d198010900091aaba03002120012375200646a09a9311919191919191999ab95337126aae7cc0144800520002300212001230071200105323302130021200130031200123019357446006240024600c6ae88d5d198010900091aaba03002120012375200646a09893119191919191919191999ab95337126aae7cc01c4800520002300212001230091200105423330243002120013003120013004120012300935744600a24002466a0246014240026ae88c00c480048cd4021d69aba23574660042400246ae8cc008480048d55d018010900091ba900323504b4988c8c8c8c8c8c8cccd5ca99b8935573e600a240029000118010900091803890008289198141801090009801890009180b1aba23003120012335006014357446ae8cc008480048d55d018010900091ba900323504a498488c8c8c8c8c8c8cccd5ca99b89300412001480008c008480048c010480041448d40acc008480048c01cd5d11aaba030041200123333572a66e24c00848005200225029230051200104f235573e60042400246ea400c8d4129262335500a75a600424002466aa004eb5d60891119a80499aa8050018010008911919191919191999ab95337126aae7cc0144800520002300212001230071200104d2335029300212001300312001233500800735744600624002466a01000c6ae88d5d198010900091aaba03002120012375200646a08c930911919191919191999ab95337126aae7cc0144800520002300212001230071200104c233502c300212001300312001233500900735744600624002460126ae88d5d198010900091aaba03002120012375200646a08a930911919191919191999ab95337126aae7cc0144800520002300212001230071200104b2335029300212001300312001233500800735744600624002460106ae88d5d198010900091aaba03002120012375200646a08893091191919191919191999ab9533712600a24002900212818918010900082591999ab9533712600a2400290011180189000918028900082591a818980109000918039aba235574060082400246666ae54cdc4980109000a40004a05e4600a2400209046aae7cc008480048dd480191a821a4c46464646666ae54cdc4980109000a40044052460042400208a46666ae54cdc4980109000a40004050460082400208a46aae7cdd480191a820a4c244646460026eac00cd4004800448c004d5411088cccd55da9280c919a80c18031aba200230033574600400208a224446464646464646666ae54cdc49aab9f300512001480008c008480048c01c480041208cd54074c00848004c00c480048c020d5d1180189000918031aba23574660042400246aae80c008480048dd480191a820a4c4002464646464646464646464646464646666ae54cdc4980509000a400c46004240024600824002098460706004240024601e6ae88d55d018060900091999ab953371260102400290021180189000918028900082511819980109000918069aba235574060122400246666ae54cdc4980289000a400446006240024600a2400208e4605e600424002460186ae88d55d018030900091999ab953371260042400290001180209000918038900082211aab9f3004120012302d3002120012375a6ae88d55d018010900091ba900323503d4988c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8c8cccd5ca99b893015120014803081248c008480041588cccd5ca99b893015120014802881288c00c480041588cccd5ca99b89301412001480208c00c480048c01c480041548cc110c00848004c00c480048dd69aba2300312001237586ae88d5d198010900091aaba030131200123333572a66e24c03c480052006230031200123007120010502330403002120013003120012375a6ae88c00c480048dd69aba23574660042400246aae80c038480048cccd5ca99b89300a12001480108c00c480048c01c4800412c8cc0fcc00848004c00c480048c038d5d118018900091bad357446ae8cc008480048d55d018048900091999ab9533712600a240029001118018900091802890008231181d980109000918049aba2355740600c2400246666ae54cdc4980109000a400046008240024600e2400208646aae7cc010480048c0d4c008480048c014d5d11aaba03002120012375200646a0789311919191919191919191919191999ab9533712601024002900111801090009180409000824119982018010900098018900098020900091bac35744600a2400246eb0d5d118018900091bac357446ae8cc008480048d5d198010900091aaba030061200123333572a66e24c00848005200023004120012300712001042235573e600824002460726004240024600a6ae88d55d018010900091ba900323503b4988c8c8c8c8c8c8c8c8cccd5ca99b89300412001480088c008480048c0104800410c8c0f8c008480048dd69aba2355740600c2400246666ae54cdc4980109000a400046008240024600e2400208246aae7cc010480048c0ecc008480048dd69aba235574060042400246ea400c8d40e92623232323232323333572a66e24d55cf980289000a400046004240024600e2400208046605c6004240026006240024600e6ae88c00c480048dd61aba23574660042400246aae80c008480048dd480191a81ca4c46464646666ae54cdc49aab9f300212001480008c008480048c010480040f08dd69aba235574060042400246ea400c8d40e1262212330010030022001222222222212333333333300100b00a0090080070060050040030022001221233001003002200122212333001004003002200111220021221223300100400312001112212330010030021120012212330010030022001121223002003112200112001122123300100300212001122123300100300212001122123300100300212001122002122001200112122230030041122200211222001120012122223004005212222300300521222230020052122223001005200122123300100300220012122222223007008221222222233006009008212222222300500812222222004122222220032212222222330020090082212222222330010090082001212230020032221223330010050040032001212230020032122300100320012323333572a66e24d55cf9ba90024800080188c008480040148d400d261261200120011123230010012233003323001001002001332233322233322233333333222222223322333332222233322233332222332233223322333222332233223332223322332233223322320012220222212330010030022001222222222212333333333300100b00a00900800700600500400300220012212330010030022001222123330010040030022001112200212212233001004003120011122123300100300211200122123300100300220011212230020031122001120011221233001003002120011221233001003002120011221233001003002120011212223003004112220021122200112001122002122001200121222230040052122223003005212222300200521222230010052001221233001003002200121222222230070082212222222330060090082122222223005008122222220041222222200322122222223300200900822122222223300100900820012122300200322212233300100500400320012122300200321223001003200112001200101

What I find interesting is the repetition of some sequences of bytes like:

etc.. which I really have a hard time correlating with the pretty-printed output. I first thought it was maybe something odd with the DeBruijn indices encoding but even if removed (encoding a program with ann = () instead), I still obtain a program of about 2KB with some rather long and cryptic bytes sequences that seem pretty far off the input data-structure :thinking:

michaelpj commented 3 years ago

What you've posted is the Plutus IR, which isn't what goes on chain. The untyped Plutus Core won't have all the type stuff, which isn't "dead" in PIR.

The resulting compiled version is ~3KB

What specifically are you referring to? Did you serialize the PIR?

michaelpj commented 3 years ago

That said, we probably do retain datatype matchers for datatypes which are used only at the type level. That's a problem, but a slightly unusual one. I made https://jira.iohk.io/browse/SCP-2638.

KtorZ commented 3 years ago

What specifically are you referring to? Did you serialize the PIR?

I am referring to (the serialization of) this https://github.com/input-output-hk/plutus/blob/fa95d9e85d26341addbb34c62556ea86ed4febae/plutus-ledger-api/src/Plutus/V1/Ledger/Scripts.hs#L88-L90 which is said to be "A script on the chain" and I was told this is what ends up on-chain.

If this isn't what ends up on-chain, then what is :relaxed: ?

michaelpj commented 3 years ago

Okay, just checking!

michaelpj commented 3 years ago

I'm also surprised it's that large. We know there's some redundancy somewhere, given that compression still works, but it's not obvious where.

KtorZ commented 3 years ago

I know you're probably looking into this (or at least, in this area), so I didn't spend too much time diving into the Flat encoder itself. But from a quick look, it looked okay. Yet, I've got quite accustomed to work with serialized binary data over the past years (although mostly CBOR) and, intuitively the output looks really strange and not something I'd expect from the Flat encoders as they are declared.

Hence why I am reporting this. If this isn't something you are currently looking into, I could give in a bit more and maybe provide a better analysis of why is it this big?

michaelpj commented 3 years ago

I'm not looking at this in detail right now, so go ahead! The main reason we're using Flat is that it should give us compact encodings of small integers, of which we have a lot. I'd be very interested if you spot any easy means of improving things!

KtorZ commented 3 years ago

I'll have a look. When you say small, what's the range you have in mind? Because CBOR is actually pretty good at encoding small numbers :sweat_smile:

michaelpj commented 3 years ago

0-7 (constructor tags). You can see the results of our previous investigation here: https://github.com/input-output-hk/plutus/blob/master/notes/plutus-core/Serialisation-flat.md

KtorZ commented 3 years ago

So, I've been really diving into the untyped-plutus-core lands, setting up some new tests to see how the serialization behaves and, I've only really tested a few cases like constants, declaring variables or function and while the Name or NamedDeBruijn programs are quite large due to the serialization of the identifier, the DeBruijn programs are rather compact and it looks fine.

Still, it does not explain the large size of that empty program and the many repetitions in the output. The PIR looks quite neat, and now that I have familiarized a bit more with the base internal constructs (Var, LamAbs, Apply, Constant), I have an intuition of how it could be translated into a Program. Yet, printing out the actual program leads to... a lot of delay and force wrapped in lambdas (more than 8000+ lines, filled with nested lam > force > delay). At this point, I am not sure of what they are (there's some mention in the package's doc, but nothing that really got me any understanding -- I'll see the specifications perhaps?).

I've heard some people mentioning that writing contracts "by hand" led to much smaller contracts; I believe they meant, not using the compiler but directly going for the Term ADT and playing around with that. I have right now the same sentiment. It feels like the Haskell -> Plutus-Core compiler is adding a LOT of extra fluff around?

michaelpj commented 3 years ago

Well, as I said, we retain all the datatype definitions for the types that get used, even if they're only used at the type level. What does it mean to retain a datatype definition? It means that we get some type abstractions (which get erased to force/delay), and some lambda abstractions for the constructors and the pattern matcher. That would all be useful if you actually did something with your ScriptContext, it's just unfortunate that we keep it around when you don't.

KtorZ commented 3 years ago

Are those information any useful for the execution of the script? I'd imagine that any information about types do indeed disappear after compilation for they're only useful for static analysis. Once done, one can certainly produce an untyped program with no type annotations whatsoever, right?

michaelpj commented 3 years ago

I'd imagine that any information about types do indeed disappear after compilation for they're only useful for static analysis.

A datatype has some type-level stuff, but also some term-level stuff: constructors and a pattern matcher. That's the stuff that's needlessly retained. As I said, it's something we could in principle optimize away. It's "just" a matter of writing a smarter compiler.

kk-hainq commented 3 years ago

@michaelpj Is there anything we can help with to reduce the output script size? We are always ready to tackle these problems.

michaelpj commented 2 years ago

Note that there are lots of issues affecting script size, but a lot of it is pretty difficult to improve on.

This particular issue is only about totally trivial scripts that don't even use the transaction information. As such, I'm not sure it's worth that much attention.

But if anyone is interested, here's what I wrote on two internal tickets about improving this.

Avoid retaining datatypes which are used only at the type level We retain datatypes which are used only at the type level, including their constructors, matchers, and datatypes which they depend on.

This is true even if the datatype is mentioned only as the type of an ignored argument, for example. It would be nice not to do that: as a crude measure, we could replace them with any made up type with the same kind.

Strip out unused constructors in PIR We can do whole-program analysis, so we can actually remove constructors from datatypes that are unused.

Steps:

  1. The dependency analysis currently makes all the components of a datatype depend on each other: stop doing that.
  2. Eliminate dead constructors and the branches of matching functions which correspond to them.

Both of these would require doing some relatively complex work on the dead code analysis, but if someone's really keen we could talk more about it.

kk-hainq commented 2 years ago

@michaelpj Yes, I meant script outputs in general. I'd love to work on both of those directions. We have a few experienced compiler people in-house too. I think all we need is a short description for each task to start. The above is sufficient for now, but anything more will be appreciated.

michaelpj commented 2 years ago

Okay, let me make some public github tickets with a bit more detail.

michaelpj commented 2 years ago

https://github.com/input-output-hk/plutus/issues/4147 https://github.com/input-output-hk/plutus/issues/4148

kk-hainq commented 2 years ago

@michaelpj Awesome, thank you!