Open scalablecory opened 6 years ago
@dotnet/jit-contrib
There are actually two optimizations here.
The first is having the jit recognize that the value of x
completely determines the value of y
. This should remove the need to materialize y
and to test its value. One common technique is to detect when this happens and replicate the blocks testing y
for each known reaching definition and then let constant propagation simplify things. See for instance Avoiding conditional branches by code replication. It is generally useful in straightening out control flow after inlining into predicates. Figuring out when this is profitable can be a little tricky as there is often some unrelated code in the test blocks and one must decide whether the cost of duplicating this is reasonable. Or the value of the input may only be known on some paths and be unknown on others. This is something we possibly should consider adding to the jit (see for instance dotnet/runtime#4207). I have a prototype lying around somewhere.
The second is to realize that some of the subsequent blocks are identical and can be merged. This is usually called "tail merging" or "cross jumping". Tail merging doesn't usually shorten path lengths and so for the most part only offers indirect performance benefits. And it can be costly in terms of compile time as one must carefully manage cases with large fan-out (say a switch where many cases execute the exact same code), decide how much wiggle room to allow in duplicate detection (eg can we reorder otherwise independent instructions ...?), and evaluate the merge candidates in proper order (since one merge can enable another). It doesn't tend to help throughput much either -- though in pathological cases it can work wonders.
Another tail merge issue: https://github.com/dotnet/runtime/issues/8795
@scalablecory realize it's been a while, but do you have example code we can try?
The jit may be getting close to doing what you expect...
I'd like the JIT to detect when one branch leads to another branch, and fold them into a single branch.
I'm sure this optimization has a specific name in compiler theory, but I don't know it. If you'll excuse a dumb example:
Here, a sufficiently advanced compiler should be able to transform this into something like:
Why?
Obviously, the dumb example is not what I'm actually doing. For performance purposes I'm updating a text parser from depending on
char
to be encoding-agnostic. In some places, it needs to allow injecting inlineable sub-parsers that handles encoding-specific things:This sits right in a hotspot in my code: optimizing this bit makes a huge difference. I've found the extra tests needed to check the return value are causing the code to perform at about 1/4th the speed when compared to a non-agnostic version with
IsNewLine
manually inlined. Having the compiler optimize away these tests would make a huge difference to me.category:cq theme:optimization skill-level:expert cost:large