QoR results are mixed, but I believe this is a change in the right direction as this is simplifying the MIR. Those combines are also needed if we want to post-pipeline benchmarks.
Mul2D goes from II=9 to II=10. We are still missing VST.SRS combining. Once this is there, this should be a candidate for the newer pipeliner
Add2D/Mul2D bf16 variants: we now introduce lots of pointer copies instead of re-using the same pointer and enforcing ordering. This should be investigated separately
ReduceSum/Mean int8 variants: should be handled in the post-pipeliner to target II=4
Conv2D_2x8_0 The regression is in the outer loop. We are changing the live ranges of our 2D/3D iterators and forcing a spill
QoR results are mixed, but I believe this is a change in the right direction as this is simplifying the MIR. Those combines are also needed if we want to post-pipeline benchmarks.
Some notes on the regressions: