effectfully commented 2 years ago

This issue is for dumping all the plans regarding builtins in a largely unstructured way.

effectfully commented 2 years ago

A lot of from the above has to be changed to become more efficient/expressive/etc. We'll need both major refactoring and small tweaking. This comment is about major refactorings.

[x] TypeScheme serves both for type checking and evaluation, which means passing around evaluation-irrelevant stuff at runtime, which is silly. So we need to split TypeScheme into two parts. This is the highest priority task, because it affects performance and blocks lots of other things, some of which also affect performance. PRs: #4379, #4516
[x] the runtime counterpart could encompass the denotation and the costing parts along with the "scheme" one. PR: #4514, #4778
[x] the current TypeScheme is generally inefficient with all those Proxys and dictionaries inside, that needs to be fixed somehow, but a direct attempt at inlining led to nowhere. Once TypeScheme is split into two parts in a direct way, we'll need to think how to monomorphize functions stored in dictionaries. High priority, because affects performance. PRs: #4317, #4397, #4398, #4421
[x] we need to reconsider caching the meanings of builtins. We did that once before (and once again) and it gave us horrible results, but that's because currently every built-in functions creates a thunk for every other built-in function before being executed, which is quite insane. Medium priority, because affects performance, but probably not too much (making sure stuff in plutus-core/plutus-core/src/PlutusCore/Default/Builtins.hs is compiled efficiently is high priority, because so far not much effort went into this direction, but that does not require any major refactoring, so that will be discussed in the next comment). UPD: there doesn't seem to be anyway around doing caching, it's just way too superior to trying to inline everything, especially given that caching benefits costing (see #4505) and inlining everything builtins-related would destroy that. UPD 2: in the end #4914 fully achieved what we needed
[x] once type checking and evaluation are properly decoupled, we'll be able to implement a simpler and at the same time more expressive inference machinery. It'll have a simpler API too, here's an example diff (if everything works out like it did in https://github.com/input-output-hk/plutus/pull/4220 which however was a bad way of doing it): Low priority, because doesn't affect performance, just makes signatures of polymorphic built-in functions more readable and so also makes adding new polymorphic built-in functions easier. PR: #4312
[x] currently we unlift values and feed them to builtins one by one and unlifting can fail, which makes it hard to specify how exactly builtins behave and potentially impedes performance. We need to check if that does impede performance and if we can transition to failing only when all arguments are provided (i.e. basically only during lifting). Low priority, because if we're going to change that, it'll require a brand new version of the language, so we'll have plenty of time for that change to come into effect anyway. PRs: #4430, #4510, #4514, #4516, #4879
[x] make costing lazier to improve expressiveness: #5239
[x] Investigate if storing arguments given to a builtin function in a list is a feasible strategy. Resolution: won't do
[x] it's worth considering returning an iterated application from the builtins machinery rather than just term, because that should allow us to implement deep unlifting of value of recursive built-in types (like Data) while keeping recursion on the Haskell side rather than the Plutus side. I'd expect that to be a lot faster. Low priority, because affects performance, but potentially requires non-trivial research. PR: #6530
[ ] Ed had this idea that we could use a data family for more compact storage of constants and I implemented that (twice) and it did seem to give an improvement, but a small one. But what if it becomes bigger if other parts of the machiinery get optimized? I looked at the core and it did look better. Low priority, because the already performed experimentation does not suggest we'll be able to get a lot out of this
[ ] Generate Arbitrary BuiltinMeanings

effectfully commented 2 years ago

Minor things, all fairly low priority:

[x] custom type errors. I had to drop some of them, because stupid errors. We still have quite a few, but there's room for improvement. In particular, detecting a stuck application of ToBinds or some other type family is worth exploring. Also an erroring instance for DefaultUni `Contains` TyVarRep would be very handy. PRs: #4345, #4403, #4557, #4648, #4649
[x] when builtins are compiled, TypeSchemes are shared (which is how we got that 30% slowdown when we stopped storing meanings of builtins in an array: toBuiltinMeaning creates a ridiculous amount of irrelevant thunks just to use a few of them and throw the others away). Is it fine, can we do better? We'll need to figure out. UPD: we're doing caching and the thunks are created once per the set of builtins due to inlining (clearly visible in generated Core), so it's all fine
[x] makeKnown and readKnown work in a constrained m. Monomorphizing those may help performance-wise. PRs: #4307, #4308, #4536
[x] there's a ridiculous problem: the default implementation of readKnown unlifts a value of a built-in type and for that it checks that the expected type tag is equal to the actual one. However instead of taking the expected tag from the global scope, it takes it from the constraint that the default implementation has. And so an obvious optimization does not happen. PR: #4380
[x] do we want to be able to inline toConstant/fromConstant calls for a particular term for better performance or do we want to move term out of the class head of KnownTypeIn like was done (and then reverted, because that caused a slowdown) in #4172? Is it possible to mix these two things? UPD: we gave up with the latter. PRs: #4419, #4481, #4499, #4533
[x] when unlifting a value of a built-in type we match on Refl to obtain a proof that the value is of the expected type. Would using unsafeCoerce give us any performance boost? Apparently, not, PR: #4400
[x] there's an enormous amount of $w$cfromInteger calls. What are those and can we get them inlined? PR: #5062
[x] I see the following in makeKnown for Integer:
```
case ww4_s1bxe `cast` <Co:5> of dt_XiSB
{ DefaultUniInteger ipv_s1apx -> (ValueOf dt_XiSB vx_ikbc) `cast` <Co:14>
})
```
what is this matching on DefaultUniInteger for? UPD: no longer see it. It was probably used in that last cast.
[x] there's a TODO in the CEK machine: "We pattern match on @arg@ twice: in 'readKnown' and in 'toExMemory'. Maybe we could fuse the two?". PR: #4778
[x] make geq reducible statically in one way or another. PRs: #4462, #4463, #5061
[x] split KnownTypeIn into two type classes: one for lifting and one for unlifting. This is due to the fact that we have some types values of which can be lifted but not unlifted: EvaluationResult and Emitter. SomeConstantOf was an example of the opposite, but it's gone now. PR: #4420
[x] a built-in function is not supposed to fail with an exception, ever, so we need a test that for any set of built-in functions fun (properly constrained of course) checks that Arbitrary arguments don't trigger an exception. PRs: #4555, #4576
[x] instead of relying on the denotation application being implicitly lazy at every step we could make all builtins return a (# #) and only be lazy there. PRs: #4607, #4778
[x] try let !cost :: (# #) -> ... in CostingFun/Core.hs so that the result of the function can be unboxed. UPD: I've tried it and nothing worked, it wouldn't unbox and it would inline instead
[x] try to lazily lookup the builtin meanings during deserialization to avoid doing repeated lookups at runtime. Moot given #4914
[x] looking at Core I feel like EvaluationSuccess should be made strict. PR: #4512
[x] transformers are way too lazy for our use case, we need something stricter. PR: #4587
[x] it would be nice to have some builtins-specific benchmarks to make changes in performance more apparent. UPD: lists and nofib do work quite OK for that
[x] stuff that is now redundant should be dropped. PRs: #4317, #4417
[x] better module structure. PR: #4363
[x] code, tests, docs polishing: UPD: we've got too many of those, so I removed all of them from here, they aren't important enough to be listed
[x] Plutus Tx builtins are inconsistent in all kinds of ways. PR: #5547
[x] unscrew matchList
[x] emit costs from equalsData as we process the two sides ensuring they're equal? In that case whenever we get things that aren't equal, we can return False immediately without continuing to emit unnecessary costs. This would slow down the happy case though (twice as much work?), so resolution: won't do
[ ] make headSpine a class method so that we can ensure well-typedness of the builtin
[ ] _BuiltinFailure is TH-derived and is not inlined. We should probably inline it just for tidier Core (it probably doesn't really affect evaluation time)
[ ] consider interleaving costing calculations and the equality check in equalsData, so that if there's any mismatch we don't keep emitting costs pointlessly
[ ] Rename DefaultUni and DefaultFun to CardanoUni and CardanoFun and pull them out into their own sublibrary with all the specific code that they depend upon to make it easier to review and maintain Cardano-specific builtins-related code.
[ ] Add a Note about "good GHC Core" for builtins
[ ] Write a doc providing a high-level view on the built-in types machinery
[ ] Write a doc providing a high-level view of the machinery that calculates built-in function call costs during script execution
[ ] Improve docs about built-in type constructors not being allowed to get applied to type variables, Ziyang was confused about this
[ ] Add a Note about lazy costing in CostingFun/Core.hs
[ ] We need to have proper error message instead of just Maybe or MonadPlus in Universe.Core . Also probably worth monomorphizing tryUniApply and what it depends upon
[ ] When I implemented minCostStream I didn’t pay much attention to its performance. There may exist ways of optimizing it, for example should we make unconsCost return a (# CostingInteger, (# (# #) | CostStream #) #) or use UnliftedDatatypes or something? Same about HeadSpine
[ ] There was a long discussion about the safety and performance of flattenCostRose. Do we want this function to emit the next cost in O(1) worst case? Is it best to define it by concatenating forests in an accumulator or via CPS (all the code is at the link) performance-wise? We probably need some kind of benchmark to answer these questions.
[ ] Figure out if we want to store BuiltinRuntime lazily explicitly to make the NoThunks accurate (currently it's not). PR: #5806
[ ] Try dropping extensible builtins completely to see how much they cost us
[ ] Consider adding Proxy to the universe, so that it's possible to provide builtins like nil (currently impossible) or pair (currently subverts the type checker). PR: #4337
[ ] Require explicit Foralls in type signature of builtins so that we don't spend a lot of time elaborating them and also make things more clear
[ ] we need to consider dropping nullary builtins completely, so that we don't need to check if it's RuntimeSchemeResult yet every time a builtin is forced/applied. Plus it's good to reflect restrictions in the types anyway, otherwise somebody might add a nullary builtin which the CEK machine (and god knows what else) doesn't support. PR: #4616 (not beneficial for performance, but we should still do it)
[ ] If we wanted to, I think we could've added arrays to the AST for them to contain arbitrary terms while still keeping all the array-related functions as builtins. E.g. forall a. Integer -> Array a -> a looks perfectly definable to me even with a not being a built-in type (assuming arrays are in the AST and not built-in)
[ ] play with compilation flags, in particular try setting -O2 in various places (setting it globally very unexpectedly makes things worse). PR: #4532

effectfully commented 2 years ago

The comment about major refactorings does not include two big things:

documenting builtins
collaborating with James to incorporate extensible builtins into the metatheory

It's probably too early to think about the latter right now, given how much the whole builtins machinery is going to change (if everything works out), so I'll focus on the former.

The inline docs in the source code are fairly good, they might use some polishing, but overall they seem pretty comprehensive. What is missing however is a high-level overview of the whole builtins machinery that one can read before diving into the specifics of each of its parts. We do have one such file, Builtins.md, but it was written long ago and it doesn't talk about polymorphic built-in types, builtin meaning inference etc.

Another thing we need is a doc (and presentation) on how to add a new built-in type or function. After all the recent refactorings it's completely trivial to add a new built-in type (regardless of whether it's monomorphic or polymorphic): it's basically just warnings/errors-driven copy-paste, but adding a new built-in function is trickier, because the polymorphic function over a polymorphic built-in type case is so much different to the other cases (improving upon the status quo is one of the major tasks).

PRs:

4338
4378
4454
4563
4605

effectfully commented 2 years ago

Performance improvements in builtins-related PRs (each number is the mean performance change for validation benchmarks):

IntersectMBO / plutus

[Epic] Internals of builtins #4306

4338

4378

4454

4563

4605

4173: -8.23%

4230: -5.09%

4264: -2.42%

4307: -12.01%

4317: -2.01%

4379: -1.78%

4397: -2.75%

4398: -2.02%

4421: -4.87%

4481: -6.7%

4496: -2.59%

4505: -3.2%

4516: -5.88%

4587: -7.91%

4778: -4.56%

4914: -10.35%

5061: -1.16%

5239: -3.12%