Open effectfully opened 2 weeks ago
@effectfully Thanks for taking the time to write this up. We are very appreciative of the time you have personally spent to help bring more utility to SOPs.
It's fair that you mark this issue as low priority, making Data
faster will indeed provide much more utility.
I wouldn't completely write off SOPs yet though, as they do have their use-case: efficient multiple branches.
On X I mentioned two issues that could be solved with a builtin that can retrieve the case index:
ifThenElse
calls when combining multiple SOPsWithout such a builtin function, it will be easier to keep using Data
.
equalsInteger(fstPair(unConstrData(a)), fstPair(unConstrData(b)))
, which is constant time, thus much better than even the linear time Plinth example you gave above.
This is easier to discuss with an example, typically seen in real-life Helios code.
Assume we the following enum:
enum State {
A {<fieldsA>}
B {<fieldsB>}
C {<fieldsC>}
}
We want to compare the state of a given UTxO being spent with the state of the return UTxO: so state0
vs state1
. These states will be stored in the Datums, so are initially in Data
format (so State::A
will be encoded as ConstrData(0, <fieldsA>)
, State::B
as ConstrData(1, <fieldsB>)
, etc.)
Let's assume we have the following valid state evolution:
The equivalent in code:
(state0, state1).switch{
(A, A) => {<branch1>}
(A, C) => {<branch2>}
(B, B) => {<branch3>}
(B, C) => {<branch4>}
(C, C) => error("illegal state change")
(C, _) => {<branch5>}
(_, _) => error("illegal state change")
}
If the State is kept as Data
such a branching expression is easy to generate using multiple nested ifThenElse
calls that compare the ConstrData
tags. This approach is not very efficient, requiring many UPLC terms.
If the State is first converted to SOP, using constr(fstPair(unConstrData(State)), sndPair(unConstrData(State)))
, we would end up with the following UPLC AST:
case(
state0AsSOP,
(fieldsA0) -> {
case(
state1AsSOP,
(fieldsA1) -> {
<branch1>
},
(fieldsB1) -> {
error("illegal state change")
},
(fieldsC1) -> {
<branch2>
}
)
},
(fieldsB0) -> {
case(
state1AsSOP,
(fieldsA1) -> {
error("illegal state change")
},
(fieldsB1) -> {
<branch3>
},
(fieldsC1) -> {
<branch4>
}
)
},
(fieldsC0) -> {
case(
state1AsSOP,
(fieldsA1) -> {
<branch5>
},
(fieldsB1) -> {
<branch5>
},
(fieldsC1) -> {
error("illegal state change")
}
)
}
)
Although this particular example doesn't seem so extreme, you should imagine many more variants, which makes the generated code blow up with many duplicate branches.
In summary: we currently consider SOPs only viable for branching over a single ConstrData
value, which has been converted using constr
immediately before the branching expression.
I'm assuming the last case would always be used as the default. This wouldn't improve anything for this example because there is no situation where 2 or mores final branches are the same.
So for this example implementing a default case isn't worth the effort.
This proposal doesn't seem very composable.
Wouldn't it then make more sense to implement a variable-args version of ifThenElse instead?
The arguments of such a term could also be implicitly delayed:
varArgsIfThenElseTerm(<cond0>, <case0>, <cond1>, <case1>, <cond2>, <case2>, <default>)
I also think this would be greatly beneficial to languages with advanced pattern matching syntax like Aiken.
To be honest I think this is too high-level for UPLC, though there seems to be some support for it: #6578
We would be able to keep the State
in the example above as SOP throughout the code-base, and simply fallback to nested ifThenElse
calls for multi-valued branching situations.
A function for retrieving the constr args would also be needed though.
SOPs can currently provide benefit in only a few situations. Are there other situations that I hadn't thought of?
Sorry for being frank, but maybe SOPs aren't worth the complexity they introduce to the CEK machine, and should be removed for UPLC version 4.
The branching functionality of SOPs could simply be replaced by allowing builtin lists to contain lambdas, and allowing a way to define such a list literally (eg. mkCons(<lambda0>, mkCons(<lambda1>, mkLambdaList(())))
I also agree with @nau that we should use either only Data
or only SOPs. As SOPs were never able to provide a complete alternative to Data
we are now in this awkward situation where we mix both for only minor benefit.
I wouldn't completely write off SOPs yet though, as they do have their use-case: efficient multiple branches.
Yes, but this is exactly what I want to bring to Data
: #6602
This is easier to discuss with an example, typically seen in real-life Helios code.
Thanks a lot for providing the example, I agree with your assessment that my idea as written of using the default case is basically useless. I mixed up pattern matching in normal languages and UPLC and suggested something stupid because of that.
With a builtin that can retrieve the case index
A function for retrieving the constr args would also be needed though.
Well, as I'm arguing in the issue description, such builtins have never made it past peer review. Given the relative unimportance of SOPs, I doubt this time would be any different.
Sorry for being frank
Frankness is always appreciated, we're very much interested in users providing honest feedback as that allows us to improve the product.
maybe SOPs aren't worth the complexity they introduce to the CEK machine
I've been gravitating towards the same conclusion. If we can make Data
as efficient as SOPs and point people towards Scott-encoding as a fallback solution when putting functions into data types is necessary, then SOPs are perhaps not worth having. But that would only remove the Constr
node from the AST, not the Case
one (as the latter is gonna be used for Data
and other builtins), so it's not like we'd save much. It may be just easier to keep SOPs in the AST anyway, since maintaining them doesn't take much effort. Dunno.
Anyway, we are very much focused on making Data
universally better than SOPs, so I'm not just saying these things to avoid improving SOPs.
The branching functionality of SOPs could simply be replaced by allowing builtin lists to contain lambdas
Probably not gonna happen, because it's hard to fit into the current design of UPLC. And if it did, everybody would start complaining that now you can have a list of Integer
s and a list of terms wrapping Integer
s, which are two different AST representations for semantically the same thing. Complain like this:
also agree with @nau that we should use either only
Data
or only SOPs. As SOPs were never able to provide a complete alternative toData
we are now in this awkward situation where we mix both for only minor benefit.
:stuck_out_tongue:
Thank you for this discussion, it really helped me to start disliking SOPs more :slightly_smiling_face:
Making case
directly more useful for Data
is a good spin (#6602), and then keeping the constr
term for backward compatibility isn't too big a price to pay.
Helios folks propose to introduce a builtin allowing one to get
i
fromTerm.Constr i args
.Their reasoning is that sometimes one only needs to check whether the constructor is a specific one, instead of providing a branch for every single constructor. E.g. if you have a big sum type like
then you can define equality checking for such a type in linear amount of matches in, say, Plinth:
but on the UPLC level this will give you a quadratic number of branches, because for each
Ci
to the left of==
you'll need to explicitly handle all ofC1
...Cn
, because acase
has to explicitly handle all possible constructors and there's can't be a catch-all clause.There's a number of issues with the proposal, though. Firstly, what if those
C*
constructors carry some arguments? How will we extract them out of there if we only know the index of the constructor?Secondly, such a builtin would break parametricity, it's the same issues as we relentlessly discussed in the
equalsCanonical
PR and in #5776 + #6225. I.e. we've already given up on important things because of the parametricity issue, so it's unlikely we're going to revert on that just for the purpose of having a builtin returning the index of a SOP.Now given that SOPs are basically on par with builtins and we seem to have a plan on making
Data
as fast as SOPs, do we even need to worry about SOPs at this point? Maybe it'll turn out that just having fastData
is enough? Sure you can't put functions intoData
and such, but does anybody actually care about that? Anyway, I digress.What we could have is
case
with a default clause. That'd solve the equality issue described above. Helios folks claim it's not enough in the general case though:so naturally I wonder what those extreme cases are.
But let's say that instead of introdcing a basic default clause, we introduce a fancy default clause giving you access to the index of the constructor. It's gonna be a bit weird, because the AST doesn't know anything about integers, but let's say we'll pull it off somehow. Would it be good or weird? It's pretty much reflection at this point and UPLC wasn't designed with any reflection capabilities in mind, so maybe that'll screw the metatheory up somehow?
Anyway, I'll mark this as low-priority, because speeding up
Data
is far more important, but now that Helios folks told us about this problem I do find the lack of default clauses unfortunate, so maybe we should do something about it.