Open odeke-em opened 8 months ago
Agreed and it seem like the optimization you propose still works as expected.
The performance penalty comes at the cost of (in my subjective opinion) slightly less code readability so I think we should only implement this optimization if we need to.
I'm also puzzled by the few instances where benchmarking shows this optimization results in a positive delta. I.e:
ExtensionWithRoots/Leopard_32x32x512_ODS-8 6.10ms ± 4% 7.32ms ±23% +20.00% (p=0.000 n=10+9)
Repair/Leopard_64x64x512_ODS-8 32.0MB ± 5% 33.5MB ± 7% +4.51% (p=0.009 n=10+10)
While studying this code I noticed something that fillerRow is as long as
ds.width+extendedWidth
while fillerExtendedRow is as long asextendedWidth
we could take advantage of loop reuse to build both fillerExtendedRow and fillerRow we can simply build them in the same loop and add a guard against the shorter length ofextendedWidth
which produces interesting results
/cc @elias-orijtech @liamsi @rootulp @musalbas @staheri14