IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 479 forks source link

PIR case-of-case is exponential and causes OOMs #6183

Open koslambrou opened 5 months ago

koslambrou commented 5 months ago

Summary

After updating to the latest plutus-tx, we noticed some significant regression in the instances generated by PlutusTx.makeIsDataIndexed ''C [...] .

Compilation times skyrocketed and memory usage is between 60-100GB of RAM (I have a lot of memory).

The issue seems to be the simplification process of the script compilation. For now, we reduced as much as possible the simplifier iterations with PlutusTx.Plugin:max-simplifier-iterations-pir=1. After that change, memory usage is back to normal.

I don't know exactly the issue, but I started by writing my own UnsafeFromData instance for C which resulted in the same issue. Then, I noticed that the amount of used memory seemed to be proportional to the number of constructors of C.

Steps to reproduce the behavior

For a minimal reproductible example:

import PlutusTx qualified
import PlutusTx.Prelude hiding (Semigroup (..), unless)
import Prelude qualified as H

data C = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R

PlutusTx.makeIsDataIndexed ''C [ ('A, 0) , ('B, 1) , ('C, 2) , ('D, 3) , ('E, 4) , ('F, 5) , ('G, 6) , ('H, 7) , ('I, 8) , ('J, 9) , ('K, 10) , ('L, 11) , ('M, 12) , ('N, 13) , ('O, 14) , ('P, 15) , ('Q, 16) , ('R, 17) ]

{-# INLINABLE validator #-}
validator ::
  BuiltinData
  -> BuiltinData
  -> BuiltinData
  -> Bool
validator d _ _ =
  let !_ = PlutusTx.unsafeFromBuiltinData @C d
   in True

unappliedValidator ::
  PlutusTx.CompiledCode (PlutusTx.BuiltinData -> PlutusTx.BuiltinData -> PlutusTx.BuiltinData -> BuiltinUnit)
unappliedValidator =
   $$(PlutusTx.compile [|| \d r c -> check $ validator d r c ||])

Actual Result

No response

Expected Result

No response

Describe the approach you would take to fix this

No response

System info

Plutus: 88ad493ad94ecdb34196f7a9aef558613bdee8b2

effectfully commented 5 months ago

There's no comment or any other indication on how this issue was triaged and why and I haven't seen any internal discussion on when we're going to address the issue and who's responsible for making sure we will, so I'm "untriaging" the issue. Sorry!

effectfully commented 5 months ago

Thank you for creating the issue and the PR with a reproducer! This is a perfect bug report.

I just took a look and the bug is indeed in the case-of-case transformation as I originally guessed. Defining caseOfCase like this:

caseOfCase binfo conservative newAnn t = pure t

makes exponential memory consumption vanish.

The culprit is most likely a known bug (see this thread) in the implementation of case-of-case.

effectfully commented 5 months ago

Oh and sorry it took so long for us to look into the issue! And thank you again for the amazing report.

effectfully commented 5 months ago

I thought I might be able to sketch a quick solution, but it didn't work out

The culprit is most likely a known bug (see this thread) in the implementation of case-of-case.

I've tried making caseOfCase do caseReduce to remove the exponential blow up in this case, like this:

-    liftQuote $ transformMOf termSubterms (processTerm binfo vinfo conservative newAnn) t
+        procTerm = fmap CaseReduce.caseReduce . processTerm binfo vinfo conservative newAnn
+    liftQuote $ transformMOf termSubterms procTerm t

It didn't help, so the problem must be elsewhere.

I've also tried making

-        , _posCaseOfCaseConservative = False
+        , _posCaseOfCaseConservative = True

as non-conservative case-of-case can also be exponential. Didn't help either.

Note [Case-of-case and duplicating code] described another source of exponentiality:

If the inner branch bodies are conapps (see Note [Case-of-case and conapps]) then
that will reduce but not necessarily eliminate the duplication. Consider

case (case s of { C1 a -> True; C2 b -> True; C3 c -> False }) of
  True -> v
  False -> w

--> (case-of-case)

case s of
  C1 a -> case True of { True -> v; False -> w; }
  C2 b -> case True of { True -> v; False -> w; }
  C3 c -> case False of { True -> v; False -> w; }

--> (case-of-known-constructor)

case s of
  C1 a -> v
  C2 b -> v
  C3 c -> w

i.e. we still duplicate v!

Maybe it's that, maybe something else. Fixing this bug requires taking a deep dive and looking at what actually happens during optimization.

In the meantime I think we should simply turn of case-of-case with all its many a source of exponentiality.

effectfully commented 5 months ago

I've now removed case-of-case entirely from the optimization pipeline in #6248.

Please check if this addresses your problem.

I'll keep the issue open until we figure out what went wrong with case-of-case and reintroduce it into the optimization pipeline.

effectfully commented 4 months ago

It seems turning case-of-case off solved performance issues for a lot of folks. We should fix the issues with the case-of-case pass and enable it back but in meantime I'm going to claim that it's a low priority task and move it back to the backlog.