Open fitzgen opened 1 year ago
I was looking into souper the other day. I wasn't able to set it up properly, but I noticed we are missing a translation for Bswap
in our harvest program. So there might be a few more optimizations possible there.
hello, please see this list of generalized versions of these:
https://gist.github.com/manasij7479/602e770d45169d5ffa73d8cd100d5b05
maybe read them over and if you have any questions, ask me and @manasij7479 to explain.
in Hydra's output, sext(1)
is just a bitwidth-independent way to say -1
hope this is useful!
I should add that in the general case, generalizing an optimization is not a well-posed problem. there are often multiple solutions, and it is often the case that it's not totally clear which one to prefer. so while the Hydra output is (hopefully) correct, it probably contains some examples where you might want to generalize things a different way in practice.
also, this notation:
(sext(1) << x0) << C2
=>
(sext(1) << C2) << x0
comes from a pretty-printer and is intended to make things easy to read for humans. the machine-readable form is Hydra's internal IR which is an extended version of Souper IR. what I'm saying is that I'm not sure that writing a parser for the pretty-printed output is the right answer. the right answer is probably to write some C++ that fits into Hydra and directly emits Cranelift pattern-matching / rewriting code. if you want to do this, please talk to us and hopefully also Manasij will help out
Thanks @regehr!! Will take a look at these when I have some free cycles.
Out of curiosity, can you provide some more details on what the reported "profit" scores are?
ah, sorry, that's the difference in cost between the LHS and RHS, where cost is souper's idiosyncratic cost model where most stuff has cost 1, a few things like select have cost 3, and then some stuff like intrinsics have cost 5.
we never really arrived at a good cost model for LLVM IR (I talked to many LLVM people about this many times and never really made good progress) and I don't expect it to be a great cost model for Cranelift either. but perhaps it's close enough.
what I'm saying is that I'm not sure that writing a parser for the pretty-printed output is the right answer.
FWIW, if Hydra outputted the same text format as Souper, we would be able to reuse the existing parser I wrote: https://docs.rs/souper-ir/latest/souper_ir/
But yeah, we will definitely reach out again when we look at automating this whole process in the future
what I'm saying is that I'm not sure that writing a parser for the pretty-printed output is the right answer.
FWIW, if Hydra outputted the same text format as Souper, we would be able to reuse the existing parser I wrote: https://docs.rs/souper-ir/latest/souper_ir/
But yeah, we will definitely reach out again when we look at automating this whole process in the future
It does, and that is what our automation for generating an LLVM pass uses.
It assigns meaning to specific identifier names, and has different semantics for width constraints. For example a %symconst prefix means it is a symbolic constant. It is pretty much Souper IR other than some details like this, so your parser could work pretty well.
this all sounds great.
but note that some things like bitwidth independence are handled outside of Souper IR, and require some care
if there's something we can do on our side, please let us know, but keep in mind that Manasij plans to defend later this spring, so the earlier the better!
Here are some synthesized optimizations for CLIF harvested from
spidermonkey.wasm
with explicit bounds checks enabled. I won't have time to investigate, generalize, or implement them before I go on vacation, so I want to note them down in an issue for posterity.Replacing
clz(x) == 0
with a comparison:Similarly, we can do this for
clz(x) == 1
, which wasn't harvested from the CLIF:Note that there is no generalization for
clz(x) == C
available here (unlessC
is larger thanx
's bit width, in which case it is always false). It only works for zero and one because we don't need to check for a lower bound onx
.Probably a similar rewrite we could do with
clz(x) == OP_BIT_WIDTH
."Reverse" const propagation of a
shl
into aselect
:Maybe not profitable if
%2
is used multiple times, in which case%1
's live range might be extended after this optimization. But I guess this is always true with these sorts of peepholes...Haven't dug into this one yet:
A bunch of masking off bits that will be shifted out anyways:
Replacing an
and
and aneq
with anult
. Haven't thought about these yet.Unnecessary
or
s:More masking off bits that will be shifted away:
A ton of "reverse" const prop with comparisons found, here are a few:
And also a bunch "reverse" const prop with other operators as well (not as many as with comparisons though). Again, here are a few: