Compiler optimizations for smaller script outputs

kk-hainq commented 2 years ago

Describe the feature you'd like

Many dApp developers desire smaller output scripts to meet the limit of the blockchain. The short-term goal is to shrink complex "raw" scripts and basic-but-non-trivial state machines to pass the limit. The long-term goal is constant improvements as smaller scripts are generally net gains to the whole network.

Here are a few proposals to tackle this problem.

Strip out unused constructors in PIR. This was proposed by @michaelpj in #4148. #4158 is an attempt with decent discussions for context but overall is facing a half-dead end. The main challenge is making it useful for the proposed goals as real-life scripts are applied at run-time. A "perfect" dependency analysis would nuke all scripts as unused functions. Persisting all input types, constructors, and their type dependencies might require too much effort for tiny gains. Hopefully, we can find a balance to proceed in this direction.
Avoid retaining datatypes in PIR, which are used only at the type level. This was proposed by @michaelpj in #4147. We have not attempted anything but this direction might face the same complications as #4148. The solution might be very natural once #4148 is solved.
Better liveness analysis. This is a general bullet point to improve the liveness analysis that includes the above two proposals. There may be more. For context, it currently over-approximates the dependency among parts of a data type.
Minimize or remove trace messages. We should add a flag to remove trace messages during compilation. dApp developers can then have easy-to-debug scripts during development and size-optimized for on-chain deployment with the same code.
More and better simplifiers. We currently have a few simplifiers in UnwrapCancel, Beta, and Inline. I wonder if we could push it further in this direction.
Write more documentation on code styles that yield smaller scripts. These can be minor choices like using if then else instead of calling || or &&. Others include matching on input redeemers to simplify validation rules (#3844), requiring more data in redeemers to not re-calculate it from script contexts, avoiding reading and converting datums from script contexts, avoiding converting data on-chain (related to #4209), and more.
Explore and integrate new optimization techniques. We are going to break this bullet point down into more concrete proposals. In general, there are still many dead code techniques to try. Others include truncating input types at compile-time and their input values accordingly at run-time, utilizing more dynamic analysis, superoptimization, peephole optimization, and more.
Document more case studies. Personally, I believe that studying real-life scripts is very beneficial for insights and inspirations towards practical solutions. At the current stage, we should favour effectiveness over elegance. We have to document real scripts for security research at Hachi anyway, will port any relevant findings and ideas here.
We should write much more documentation, tests, and benchmarks.
Support forced builtins as documented in #4183. We can then eliminate the "manual" forces at each builtin usage.

Describe alternatives you've considered

Different projects have been experimenting with different solutions to this problem. Many have to simplify application logic that reduces the set of functionalities a dApp can offer. A few have to write code in unnatural ways to exchange readability and risk bugs for a deployable script. Others have gone as far as writing their own optimizers or even compilers for other source languages.

The first two routes are both unfortunate and not scalable. The last one is exciting but would require too much time to be practical soon. We believe that helping improve the current compilation pipeline makes the most sense.

Additional context / screenshots

We have several people who are willing to help with all these proposals. We are likely to add more or write more on existing proposals with time. We can also write more documentation for interested people to join the work.

Relevant issues:

3582
3702
https://github.com/input-output-hk/plutus-apps/issues/11

michaelpj commented 2 years ago

Better liveness analysis

I think this is pretty good already. The only issue is the datatype issue you referenced.

Minimize or remove trace messages.

We thought about this. It's a bit tricky. It's likely that people will want to re-run the script that actually failed on the chain if something does fail, and it would be quite annoying to not have the trace information in that case. And you can't necessarily just swap in an alternative version if things care about hashes...

So it's complicated, which is why we did the stupid thing of just leaving them in.

Explore and integrate new optimization techniques.

There's lots of optimization we can do, although it's very unclear how much will actually help. At some point you just have to include the code the user asked for, which can be a lot!

There are also much more drastic things that we are considering internally. I'll mention a few of them here.

Compress scripts. Compressing scripts gets us about a 40% saving, even given our reasonably compact binary encoding. Currently the idea is to implement this in the ledger, and I hope it will be in the next HF.
Script references. Sketchy at the moment, and relies on several non-implemented ledger extensions, but we'd like to have a way to post scripts to the chain and then reference them afterwards, rather than having to submit them each time.
Partial script references. Even more sketchy, but it would be nice to just be able to reference a large chunk of code (e.g. the data decoder for ScriptContext) somehow, rather than having to submit it every time.
Pass structured data into scripts differently. Too soon to say much, but we waste a lot of time and space on fromData, it would be a big win to get rid of it.

All of these require ledger changes, so there's a bunch of design work that needs to go on etc.

kk-hainq commented 2 years ago

We thought about this. It's a bit tricky. It's likely that people will want to re-run the script that actually failed on the chain if something does fail, and it would be quite annoying to not have the trace information in that case. And you can't necessarily just swap in an alternative version if things care about hashes...

So it's complicated, which is why we did the stupid thing of just leaving them in.

Does the proposal of adding a flag to remove traces make sense to you then? I know many developers would want that in these early days of tight limits. I guess in the long run we can map error codes off-chain or something.

There's lots of optimization we can do, although it's very unclear how much will actually help. At some point you just have to include the code the user asked for, which can be a lot!

That's why we want to help so we can write more logic without polluting our shared blockchain!

There are also much more drastic things that we are considering internally. I'll mention a few of them here.

Compress scripts. Compressing scripts gets us about a 40% saving, even given our reasonably compact binary encoding. Currently the idea is to implement this in the ledger, and I hope it will be in the next HF.

Script references. Sketchy at the moment, and relies on several non-implemented ledger extensions, but we'd like to have a way to post scripts to the chain and then reference them afterwards, rather than having to submit them each time.

Partial script references. Even more sketchy, but it would be nice to just be able to reference a large chunk of code (e.g. the data decoder for ScriptContext) somehow, rather than having to submit it every time.

Pass structured data into scripts differently. Too soon to say much, but we waste a lot of time and space on fromData, it would be a big win to get rid of it.

All of these require ledger changes, so there's a bunch of design work that needs to go on etc.

We haven't thought of 1 before, 40% would be very very nice. 2 does make sense. 3 would be a beauty. 4 would be indeed very practical, we realized that too and do tell each other to refrain from converting data on-chain. I'll continue to work on dependency analysis and removing unused data types for now given our earlier suggestions. Just tell me if you need anything more anytime!

michaelpj commented 2 years ago

Does the proposal of adding a flag to remove traces make sense to you then?

Sure. It would be very simple: a pass that replaces all string literals with the empty one! With a plugin option to enable it. I'd take a PR for this. I'm somewhat unsure that it's a good idea, but having the option doesn't seem too bad.

kk-hainq commented 2 years ago

Sure. It would be very simple: a pass that replaces all string literals with the empty one! With a plugin option to enable it. I'd take a PR for this. I'm somewhat unsure that it's a good idea, but having the option doesn't seem too bad.

I think people would love an option to remove all traces for good too. I'll get a few things up by the end of the week!

michaelpj commented 2 years ago

I think people would love an option to remove all traces for good too. I'll get a few things up by the end of the week!

Right, so you could both

Transform all string literals into the empty string
Replace all occurrences of trace str a with a

Perhaps the latter would be sufficient.

effectfully commented 1 year ago

We've already done a lot to reduce the size of the compiled scripts, but we do recognize that sizes are still far from being ideal. It is one of our objectives to further reduce script sizes, hence I'm adding the status: objective label.

IntersectMBO / plutus