Speed up packing by using copy_nonoverlapping

bluss / matrixmultiply

General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.

Apache License 2.0

213 stars 25 forks source link

This is an improvement for packing when we're packing a matrix that's laid out in the direction that we are packing in.

Given a matrix product A B we prefer column-major for A and row-major for B when packing. This PR is an improvement in all contiguous matrix combinations except B A where we have a row-major matrix times a column major matrix.

This packing code seemed to be aliasing-challenged and how it compiles is held back by the optimizer not wanting to trust that the pointers pack and a never overlap. The second commit with the switch from pointer increment to counter increment seems to have solved that, but with no big improvement in performance.

name 63 ns/iter 62 ns/iter diff ns/iter diff % mat_mul_f32::m032 2,303 2,082 -221 -9.60% mat_mul_f32::m064 14,091 13,321 -770 -5.46% mat_mul_f64::m032 4,214 4,096 -118 -2.80% mat_mul_f64::m064 28,055 27,246 -809 -2.88%

bluss / matrixmultiply

Speed up packing by using copy_nonoverlapping #26