Instead of setting "c = 0; c += ab", use c = ab directly
Try to resolve some of the aliasing problems
Functions where we have multiple raw pointers often have the compiler confused about aliasing. In this case, *cptr *= beta; *cptr += *ab; had it compile to 2 reads of the value in *cptr when one would be enough. We could explicitly write the read value, operate, store back, but here we tried instead to insert a function that takes the mask_buf pointer as a shared reference (and those are marked noalias). This is a workaround that sometimes works..
c = ab
directlyFunctions where we have multiple raw pointers often have the compiler confused about aliasing. In this case,
*cptr *= beta; *cptr += *ab;
had it compile to 2 reads of the value in*cptr
when one would be enough. We could explicitly write the read value, operate, store back, but here we tried instead to insert a function that takes themask_buf
pointer as a shared reference (and those are marked noalias). This is a workaround that sometimes works..Fallback kernel benchmark improvement:
Avx kernel improvement with direct set:
(261 was a case that was added, but its use of the masked kernel is too minor to make us notice anything.)