bluss / matrixmultiply

General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.
https://docs.rs/matrixmultiply/
Apache License 2.0
209 stars 25 forks source link

Minor changes to kernel masking #42

Closed bluss closed 5 years ago

bluss commented 5 years ago

Functions where we have multiple raw pointers often have the compiler confused about aliasing. In this case, *cptr *= beta; *cptr += *ab; had it compile to 2 reads of the value in *cptr when one would be enough. We could explicitly write the read value, operate, store back, but here we tried instead to insert a function that takes the mask_buf pointer as a shared reference (and those are marked noalias). This is a workaround that sometimes works..

Fallback kernel benchmark improvement:

 name               63 ns/iter  62 ns/iter  diff ns/iter  diff % 
 mat_mul_f32::m032  3,624       3,393               -231  -6.37% 
 mat_mul_f64::m032  6,410       6,110               -300  -4.68%

Avx kernel improvement with direct set:

 name               63 ns/iter  62 ns/iter  diff ns/iter  diff % 
 mat_mul_f32::m012  508         484                  -24  -4.72% 
 mat_mul_f32::m127  94,366      93,774              -592  -0.63% 
 mat_mul_f32::m261  785,958     786,113              155   0.02% 
 mat_mul_f64::m012  509         498                  -11  -2.16% 
 mat_mul_f64::m127  180,432     178,201           -2,231  -1.24%

(261 was a case that was added, but its use of the masked kernel is too minor to make us notice anything.)