bluss / matrixmultiply

General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.
https://docs.rs/matrixmultiply/
Apache License 2.0
213 stars 25 forks source link

Speed up packing by using copy_nonoverlapping #26

Closed bluss closed 6 years ago

bluss commented 6 years ago

This is an improvement for packing when we're packing a matrix that's laid out in the direction that we are packing in.

Given a matrix product A B we prefer column-major for A and row-major for B when packing. This PR is an improvement in all contiguous matrix combinations except B A where we have a row-major matrix times a column major matrix.

This packing code seemed to be aliasing-challenged and how it compiles is held back by the optimizer not wanting to trust that the pointers pack and a never overlap. The second commit with the switch from pointer increment to counter increment seems to have solved that, but with no big improvement in performance.

bluss commented 6 years ago

Benchmark improvement on small problems. These are "C C" matrix multiplies where both sides are row major. (Both sides column major is exactly the same).

 name               63 ns/iter  62 ns/iter  diff ns/iter  diff % 
 mat_mul_f32::m032  2,303       2,082               -221  -9.60% 
 mat_mul_f32::m064  14,091      13,321              -770  -5.46% 
 mat_mul_f64::m032  4,214       4,096               -118  -2.80% 
 mat_mul_f64::m064  28,055      27,246              -809  -2.88%