Faster pattern matching for built-in types

IntersectMBO / plutus

The Plutus language implementation and tools

Apache License 2.0

1.55k stars 466 forks source link

Faster pattern matching for built-in types #5711

Open effectfully opened 5 months ago

effectfully commented 5 months ago

This issue is for discussing approaches to improving performance of pattern matching for built-in types.

The current approach is inefficient. Details can be found in this Note, the gist is that currently pattern matching over lists is implemented as (pseudocode)

matchList :: [a] -> b -> (a -> [a] -> b) -> b
matchList xs z f = chooseList xs (\_ -> z) (\_ -> f (head xs) (tail xs)) ()

where chooseList, head and tail are all built-in functions (chooseList returns either its second or its third argument depending on whether its first argument is null or not, respectively). Therefore in case of nil we end up performing one builtin call and in case of cons we end up performing three builtin calls. Note how we also have to pass a unit argument around in order not to always evaluate all branches: Plutus is strict, hence chooseList xs z (f (head xs) (tail xs)) would have the wrong semantics.

There have been two proposals on how to implement faster pattern matching for built-in types:

allow expressing pattern matching built-in functions directly, implemented in #5486
piggy-back on the pattern matching machinery that we use for sums-of-products, implemented in #5704

Both PRs are documented, see the initial comment on each PR if you want to understand the details of design and implementation.

Both approaches work and give us comparable speedup on built-in-list-specific benchmarks.

This is what we get for (1):

Screenshot from 2024-01-12 00-41-30

Screenshot from 2024-01-12 00-45-13

This is what we get for (2):

Screenshot from 2024-01-12 00-27-27

Screenshot from 2024-01-12 00-27-41

It may appear that (2) is faster, however the benchmarking machine apparently is capable (see this slack discussion) of being wrong by 6% and probably even more, hence we can't really rely on these results, but it's still clear that both the approaches give us meaningful improvement of about the same scale.

If we analyze performance analytically, this is what we'll find:

(1) has to keep unit arguments around in order for pattern matching not to force all the branches at once and only force the picked one, while (2), being backed by SOPs, has no such restriction since case is specifically designed to be a proper pattern matching construct (unlike function application which is strict in Plutus). E.g. for (1) we have the following diff:

Screenshot from 2024-01-10 01-06-15

while for (2) it's

Screenshot from 2024-01-10 01-05-01

I.e. (2) wins for this one.

in (1) matchList is literally a single built-in function call, while in (2) it's a built-in function call + a case:

Screenshot from 2024-01-12 01-59-18

which affects not only performance, but also size, which is an even scarcer resource.

I.e. (1) wins for this one.

(1) requires us to amend the builtins machinery in a way that allows for returning an applied function. Not only is this a much larger change (propagating through many parts of the builtins machinery, including tests) compared to what (2) requires, it also introduces a slowdown for all existing builtins, because handling exactly one output of a built-in function is a bit faster than handling any non-zero number of them (if a builtin returns f x y we simply return all of those terms separately and let the evaluator handle reducing the application -- we have to do that, because each evaluator deals with reducing applications differently), because one doesn't need to case on the result to figure out if it contains a single output or multiple of them.

I.e. (2) wins for this one.

the whole issue we've been discussing here is improving performance of pattern matching for built-in types. Extending the builtins machinery seems very appropriate for that, particularly with something as straightforward as "allow for returning a function application", while teaching builtins about SOPs and hardcoding a specific SOP representation right into some built-in functions feels weird -- why do builtins and SOPs have to be intertwined this way?

Maybe it's fine, but I believe (1) wins for this one.

as per Michael's comment SOPs aren't supported with all versions of Plutus, so we need to figure out what to do for early ones that don't support SOPs

I.e. (1) wins for this one.

Overall, there's no clear winner. Performance comparison is unclear with both approaches delivering meaningful improvement, although my very subjective feeling is that (2) slightly wins when it comes to performance. But also (1) makes builtins more expressive, maybe for some another reason we'll end up needing to return function applications from builtins eventually anyway?

I'm personally torn between the two options. Which one should we choose?

MicroProofs commented 5 months ago

One thing I would like about the second one is if data_to_constr converted the constr tag to a sum of product as well.

Basically this Constr i ds -> toConstr 0 [fromValue (toConstr i [fromValue ds])]

effectfully commented 5 months ago

One thing I would like about the second one is if data_to_constr converted the constr tag to a sum of product as well.

Oh yeah, that's on the menu, I just didn't implement it, because it's a bunch of boilerplate and I'm not sure if we're going to pick this option rather than the other one, hence might potentially result in time waste. I'm going to speed up pattern matching for tuples too.

Also we do remember about the request for the scrutinee to come last, I didn't reflect it here, because I didn't want to change anything mid-way. Once we've settled on an approach, we'll discuss the idea with the team and likely change the implementation accordingly.

michaelpj commented 5 months ago

(1) has to keep unit arguments around in order for pattern matching not to force all the branches at once and only force the picked one

I had to think about this a bit. If we abstract over the matcher with a function we lose this property (since function arguments get forced). But maybe we recover it with inlining, plus we could push more applications into case branches. (Although this is tricky in UPLC since we need to know how many lambdas to go under. Boo. An argument I didn't think of for making the bindings for case variables explicit in the term...)

Not only is this a much larger change (propagating through many parts of the builtins machinery, including tests) compared to what (2) requires, it also introduces a slowdown for all existing builtins

Is this visible in other benchmarks?

One thing I would like about the second one is if data_to_constr converted the constr tag to a sum of product as well.

Right, this is just "pattern matching on Data" and not "pattern-matching on datatypes encoded using Data", which would also be convenient but we still don't know how to type it.

I'm personally torn between the two options. Which one should we choose?

I don't know either. It's funny there is no performance difference, that was a potential decision point.

Did we get any input from @mjaskelioff about whether the change to allow builtins to return an application spine would affect the formalism at all?

mjaskelioff commented 5 months ago

Did we get any input from @mjaskelioff about whether the change to allow builtins to return an application spine would affect the formalism at all?

Returning an application spine would affect the formalism, but I don't see it as being too problematic. The main complexity in the builtin formalisation is in consuming the arguments, rather than in the return type.

From the point of view of specification (formal or not) it seems like a win for (2) as (2) is simpler to explain than (1), were we would need to introduce spines.

effectfully commented 5 months ago

If we abstract over the matcher with a function we lose this property (since function arguments get forced). But maybe we recover it with inlining

But we're not inlining matchList in case of (1), because it's literally a standalone builtin call and so there's nothing to inline. Or did I misunderstand you?

we could push more applications into case branches.

Yup.

Although this is tricky in UPLC since we need to know how many lambdas to go under. Boo.

Yeah, that sucks.

An argument I didn't think of for making the bindings for case variables explicit in the term...)

None of us did.

Is this visible in other benchmarks?

Yes, but the benchmarking machine is unreliable, so no.

I don't know either. It's funny there is no performance difference, that was a potential decision point.

Well I'm sure there's some, it's just that it's not big enough for us to be able to detect it using broken tools.

kwxm commented 5 months ago

I don't have much to say, I find this (about option1) a bit disturbing:

a much larger change (propagating through many parts of the builtins machinery, including tests)

And also this:

maybe for some another reason we'll end up needing to return function applications from builtins eventually anyway?

That sounds a bit like premature future-proofing that we may never need. I'm sure in the past I've been worried about the possibility of builtins returning more general terms for some reason, but I don't recall the details at the moment.

kwxm commented 5 months ago

About the possibility of buitlins returning applications: could we return a saturated application wrapped in a delay that you then have to force? I haven't thought it through properly, but such a thing would be a CekValue, so it would fit with the way the evaluator works at present.

michaelpj commented 5 months ago

Outcome of our discussion:

Some mild feeling in favour of SOPs
Need to work out what happens if you call a builtin that returns a SOP value in a PLC program that doesn't support SOPs
Other potentially fast things we could do includes folds by returning a tree of applications
- But: is this faster? Unclear: you factor out the recursion but you still have to process the tree of applications
- Should try it: if it's not faster then this approach seems like a dead end

michaelpj commented 5 months ago

Hang on, doesn't the SOP approach also violate our old assumption that the only types which appear in the signatures of builtins are either a) builtin types or b) type variables? Now we would be able to have an SOP type there.

mjaskelioff commented 5 months ago

Yes. We would have to go for extending the notion of a) and saying that a SOP of builtin types is a builtin type.

mjaskelioff commented 5 months ago

In any case, what this discussion shows is that these are not just any new builtins: both ideas require a change to the language. So the discussion needs to go deeper and cannot rely solely on implementations: we need to understand the consequences on the language design. What would be the specification of the new features? How would they affect the type system of TPLC?

michaelpj commented 5 months ago

Yes, I had missed that returning SOPs was an extension!

effectfully commented 5 months ago

@mjaskelioff

Yes. We would have to go for extending the notion of a) and saying that a SOP of builtin types is a builtin type.

As per our Slack discussion that wouldn't work, because then types would suggest that putting SOPs into lists or pairs was possible, but that is not supported and we have no plans on supporting it, since it's a pain and we don't have a use case anyway.

In any case, what this discussion shows is that these are not just any new builtins: both ideas require a change to the language.

Yes, I should've stated explicitly that I'm not simply proposing a few new builtins, but I thought it was obvious from

(1) requires us to amend the builtins machinery in a way that allows for returning an applied function. Not only is this a much larger change (propagating through many parts of the builtins machinery, including tests) compared to what (2) requires

and

teaching builtins about SOPs and hardcoding a specific SOP representation right into some built-in functions feels weird -- why do builtins and SOPs have to be intertwined this way?

So the discussion needs to go deeper and cannot rely solely on implementations: we need to understand the consequences on the language design. What would be the specification of the new features? How would they affect the type system of TPLC?

Good thing I sent an email about this issue to the Plutus mailing list!

I think the type system consequences for (1) are negligible, we're just allowing builtins to apply functions, seems very straightforward and mirrors the functionality of regular functions.

(2) is somewhat weird and it'd fantastic if type theory experts had a deep dive.

@michaelpj

Hang on, doesn't the SOP approach also violate our old assumption that the only types which appear in the signatures of builtins are either a) builtin types or b) type variables? Now we would be able to have an SOP type there.

Yes, we'll need to add SOP to builtin signatures to the specification and the metatheory. In the implementation it's trivial and I did it in the PR. I think we should allow for SOP values to be arguments, because even though it's probably useless, it's not wrong and forbidding it would be more work than not doing that, at least in the implementation. I'll let the specification and the metatheory folks decide how they want it to be there.

michaelpj commented 5 months ago

Okay, still it's good to be clear that there's an extension either way. If anything the SOP extension seems more dubious to me so I've tipped back in favour of the pattern-matching builtins :upside_down_face:

effectfully commented 5 months ago

Note that as far as I remember from the specification/formalization point of view we also need to allow for -> to appear in types of arguments for pattern matching builtins, which is something that the implementation already supports. But that should be very straightforward.

wadler commented 5 months ago

the whole issue we've been discussing here is improving performance of pattern matching for built-in types. Extending the builtins machinery seems very appropriate for that, particularly with something as straightforward as "allow for returning a function application", while teaching builtins about SOPs and hardcoding a specific SOP representation right into some built-in functions feels weird -- why do builtins and SOPs have to be intertwined this way? Maybe it's fine, but I believe (1) wins for this one.

This is the one place where I disagree. If we add SOP to improve (time and space) efficiency, and we make a change to builtins to improve efficiency, it is not so odd if these interact.

MicroProofs commented 5 months ago

One thing I would like about the second one is if data_to_constr converted the constr tag to a sum of product as well.

Basically this

Constr i ds -> toConstr 0 [fromValue (toConstr i [fromValue ds])]

I'm also still in favor of supporting SOPs especially to support the quoted text above in the future. An alternative I can think of is integer -> constr i []. But all these cases require SOP typing with builtins. So I'm leaning toward returning SOP with builtins. I know these suggestions are slightly tangential to the specific ones above, but having the typing paves the way forward for more SOP builtins.

effectfully commented 5 months ago

@wadler

This is the one place where I disagree. If we add SOP to improve (time and space) efficiency, and we make a change to builtins to improve efficiency, it is not so odd if these interact.

That's fair, I didn't consider this perspective, thank you.

@MicroProofs

I'm also still in favor of supporting SOPs especially to support the quoted text above in the future. An alternative I can think of is integer -> constr i []. But all these cases require SOP typing with builtins. So I'm leaning toward returning SOP with builtins. I know these suggestions are slightly tangential to the specific ones above, but having the typing paves the way forward for more SOP builtins.

This is also a good point. With the reservation that while we fully recognize how helpful such functionality would be, we still have no idea to how implement it in an even remotely sensible way. I'll create a separate issue about that and we'll keep thinking about it, but don't hold your breath.

@michaelpj

Okay, still it's good to be clear that there's an extension either way. If anything the SOP extension seems more dubious to me so I've tipped back in favour of the pattern-matching builtins 🙃

This reasoning surprises me, because the type system part kinda directly follows from allowing SOPs to appear from within builtins. It's the operational semantics of this interaction that I'm being paranoid over, but if we accept those, I feel like the type system extension is negligible in comparison? It's literally just allowing more Plutus types to appear in signatures of builtins, to my taste we could allow all of them there (that's what the implementation amounts to, pretty much) and it'd be fine (I do agree with the view that having a specific type of builtin signatures that aren't hardcoded to be Plutus types is good design and that the implementation should introduce such machinery).

michaelpj commented 5 months ago

It's the operational semantics of this interaction that I'm being paranoid over, but if we accept those, I feel like the type system extension is negligible in comparison?

This is a bit vague, but I feel like historically I have thought of our builtin types and functions as being imported wholesale from another universe of types and functions. i.e. there's another universe with just integers, etc., and that's where integer builtin functions come from, and we just embed them. For polymorphic builtins, they're just so polymorphic that they work with everything (hand waving!). But if we allow actual PLC types from the PLC type universe to appear in builtin signatures, that story is completely dead. Maybe that's not a problem at all, I'm unsure if this metaphor I want to use has any actual consequences!

mjaskelioff commented 5 months ago

What is currently done in the metatheory for builtins is to define a universe of argument types. Currently this universe consists of either type variables of kind , or builtin-compatible types (atomic constants, type variables of kind #, and (pair or list) applied to builtin-compatible types. We could extend this universe with SOPs, but we would need to define precisely which SOP types we would allow to appear in an argument: are SOPs only top-level (like -kinded variables) or can they appear anywhere? (say in a list of SOPs).

Note that the universe is called "argument types" because the return type can be in a different universe. (Currently the result type is an argument type). Also, the universe is sort of independent of PLC types (the same argument type is interpreted in different type systems)

So, speaking from the POV of the metatheory, I don't think the typing is going to be a big problem. It's a matter of defining appropriate universes.

On Fri, Jan 26, 2024 at 6:44 AM Michael Peyton Jones < @.***> wrote:

It's the operational semantics of this interaction that I'm being paranoid over, but if we accept those, I feel like the type system extension is negligible in comparison?

This is a bit vague, but I feel like historically I have thought of our builtin types and functions as being imported wholesale from another universe of types and functions. i.e. there's another universe with just integers, etc., and that's where integer builtin functions come from, and we just embed them. For polymorphic builtins, they're just so polymorphic that they work with everything (hand waving!). But if we allow actual PLC types from the PLC type universe to appear in builtin signatures, that story is completely dead. Maybe that's not a problem at all, I'm unsure if this metaphor I want to use has any actual consequences!

— Reply to this email directly, view it on GitHub https://github.com/IntersectMBO/plutus/issues/5711#issuecomment-1911759347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDDMXGGKXO3T2A5OKLOOLYQN3IHAVCNFSM6AAAAABBVSBKNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJRG42TSMZUG4 . You are receiving this because you were mentioned.Message ID: @.***>

effectfully commented 1 week ago

@kwxm

I don't have much to say, I find this (about option1) a bit disturbing:

a much larger change (propagating through many parts of the builtins machinery, including tests)

It's much larger compared to SOPs and in terms of the number of files affected. If you look into the actual diff and type .hs into "Filter changed files", you'll probably agree it's not that big of a deal.

And also this:

maybe for some another reason we'll end up needing to return function applications from builtins eventually anyway?

That sounds a bit like premature future-proofing that we may never need. I'm sure in the past I've been worried about the possibility of builtins returning more general terms for some reason, but I don't recall the details at the moment.

I agree. But I've now created an issue arguing that we need to be able to return applications from builtins to get faster processing of Data objects as well. See also my response to @MicroProofs below in this message.

About the possibility of buitlins returning applications: could we return a saturated application wrapped in a delay that you then have to force? I haven't thought it through properly, but such a thing would be a CekValue, so it would fit with the way the evaluator works at present.

It's not easily a CekValue as you need to somehow package the iterated application into that value and the way of doing that depends on the type of the value (i.e. for CkValue for example it would be different), so in the end it's just much simpler to return an iterated application and let the evaluator process it the way it normally processes applications. The CEK and CK machines are very similar, so we would probably be able to produce values directly without going through returning an iterated application, but if we add bytecode in future, then who knows how similar it's going to be to the machines that we have now, so maybe producing a value that the bytecode evaluator expects would be significantly more complicated.

Overall, I just don't think that the builtins machinery should do evaluator's job when we can avoid that.

I think that the delay point is red herring, why immediately force something delayed when you can simply not delay anything in the first place.

@wadler

This is the one place where I disagree. If we add SOP to improve (time and space) efficiency, and we make a change to builtins to improve efficiency, it is not so odd if these interact.

Fair point. I can't quite determine what is it that I find unsettling about making builtins return SOPs. Maybe I'm just being silly.

@MicroProofs

One thing I would like about the second one is if data_to_constr converted the constr tag to a sum of product as well.

Basically this

Constr i ds -> toConstr 0 [fromValue (toConstr i [fromValue ds])]

I'm also still in favor of supporting SOPs especially to support the quoted text above in the future. An alternative I can think of is integer -> constr i []. But all these cases require SOP typing with builtins. So I'm leaning toward returning SOP with builtins. I know these suggestions are slightly tangential to the specific ones above, but having the typing paves the way forward for more SOP builtins.

I've now given the idea of using SOPs for that purpose a serious consideration, please checkout out this issue for a discussion on the builtin that you propose and this issue for a more general topic of getting faster processing of Data objects with or without SOPs involved and my conclusion is that SOPs aren't suitable for that purpose and we should instead have multi-lambdas that must be exactly saturated, see the latter issue for all the gory details. And having iterated applications will help that, unlike SOPs.

I may be very wrong, so if you think there's a flaw in my reasoning or I simply missed something, please do let me know in one of those issues.

But so far I prefer the approach of returning iterated applications from builtins and that's what I'll end up implementing unless somebody finds a reason to prefer SOPs or the team votes in favor of doing SOPs even if we don't have an exact reason to.

@michaelpj

But if we allow actual PLC types from the PLC type universe to appear in builtin signatures, that story is completely dead.

The implementation supports all of PLC types already (or can be trivially made to), but to the best of my knowledge that is not used in the existing builtins indeed. Note however that allowing for a builtin to return iterated application is only useful if we pass functions to builtins and functions have -> in their types and that is a PLC type, not a built-in one. So either way we're going to extend the set of types allowed in builtins with some subset of PLC types.

@mjaskelioff

We could extend this universe with SOPs, but we would need to define precisely which SOP types we would allow to appear in an argument: are SOPs only top-level (like *-kinded variables) or can they appear anywhere? (say in a list of SOPs).

The implementation doesn't make any difference between # and *, but semantically SOPs can only be of kind *.