Closed bluss closed 6 years ago
Benchmark improvement on small problems. These are "C C" matrix multiplies where both sides are row major. (Both sides column major is exactly the same).
name 63 ns/iter 62 ns/iter diff ns/iter diff %
mat_mul_f32::m032 2,303 2,082 -221 -9.60%
mat_mul_f32::m064 14,091 13,321 -770 -5.46%
mat_mul_f64::m032 4,214 4,096 -118 -2.80%
mat_mul_f64::m064 28,055 27,246 -809 -2.88%
This is an improvement for packing when we're packing a matrix that's laid out in the direction that we are packing in.
Given a matrix product A B we prefer column-major for A and row-major for B when packing. This PR is an improvement in all contiguous matrix combinations except B A where we have a row-major matrix times a column major matrix.
This packing code seemed to be aliasing-challenged and how it compiles is held back by the optimizer not wanting to trust that the pointers
pack
anda
never overlap. The second commit with the switch from pointer increment to counter increment seems to have solved that, but with no big improvement in performance.