Since Neon intrinstics require nightly features, the vectorized functions are only enabled on Rust-nightly.
The speedup depends a bit on the model. The scaled_add function did already get vectorized with recent Rust compilers. However, the same is not true for dot, probably because it is unsafe math (floating point summation order).
Since Neon intrinstics require nightly features, the vectorized functions are only enabled on Rust-nightly.
The speedup depends a bit on the model. The
scaled_add
function did already get vectorized with recent Rust compilers. However, the same is not true fordot
, probably because it is unsafe math (floating point summation order).Dot product before this change:
After this change: