IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 479 forks source link

[Epic] PIR generators #6531

Open effectfully opened 1 month ago

effectfully commented 1 month ago

This is an umbrella issue for all things related to PIR generators.

effectfully commented 1 month ago

QuviQ had a trouble using our PIR type checker with their PIR terms generator: they had to do introduce unrenaming of escaped data type variables, we couldn't figure out a way of avoiding that and so we now have to maintain that disgusting unrenaming in there. Can we do any better? We've had quite a lot of discussion on the Plutus mailing list about the topic (thread for those who have access), but nothing really came out of it. Perhaps we're stuck with this nonsense forever.

effectfully commented 1 month ago

Currently generation of a single fresh name requires processing literally every single bit of the current context, which is absurdly inefficient. This operation should be O(1), not O(size_of_the_context).

Originally I thought we could just change the code to use Quote, but some experimentation showed that this isn’t gonna cut it:

names are generated all over the place and put into the current context, inside and outside of the GenTm monad. It’s really hard to move all of that to run in Quote simply because it’s trivial to miss a call here and there

the current fresh name generation is pretty fancy and doesn’t just always pick the next fresh name. It can pick the next 10th fresh name, then go back, then again forward etc – all of that between different calls to the function generating fresh names. We want to keep this logic as it allows us to produce more diverse terms

This ticket is for sorting everything out. How beneficial is it to make generation of fresh names properly fast? Is it worth the trouble of having another field in GenEnv and keeping it in sync with the other relevant parts of the context (of which we have plenty)? I assume the latter is the only solution allowing us to replicate the existing logic while keeping the cost low.

effectfully commented 1 month ago

Shrinking is sometimes slow because a large number of shrinks are generated. See this comment.

effectfully commented 1 month ago

Currently the shrinking order on types isn’t specified anywhere in the PIR generators making it extremely hard to:

  1. verify that the shrinker does the right thing and can’t end up being stuck in a loop
  2. change literally anything in the shrinker (because it’s hard to tell what implicit invariant a change might break)

We need to fix that by figuring out the shrinking order, documenting it and ensuring that existing code complies with it.

effectfully commented 1 month ago

Currently fixKind returns only one type. Meaning the shrinker always chooses, say, a single particular type variable when fixing the kind of another type variable. It would be nice if it could look at all suitable type variables. This ticket is for investigating if it’s hard to arrange that and if its' beneficial to do so.

effectfully commented 1 month ago

I wanted to add a very simple property test using the PIR generators, but got surprised at how slow the test turned out to be, so I've done some experiments and it seems like compiling a PIR program (using toUPlc) is 100x less efficient than what it takes to generate that program using the PIR generators. And this is given that the PIR generators are known to be asymptotically slow (and I'm sure the constants in the complexity function aren't particularly small either), although I'd say that the PIR generators are reasonably fast and aren't worth optimizing currently, given that they produce 10k terms in 4 seconds on my machine.

But that's not the best part. The best part is that compilation with all the default optimizations erases 90% to 99% of the generated term. I.e. we spend all this time generating a term, then we sweat compiling it just to be produce a minuscule and in most cases completely trivial term.

The end result is that at present the PIR generators are basically useless if you want to compile the produced term and do something with the result, it's just a waste of resources by any measure I can think of.