Using a reference type (such as a slice) for either pack or a in the packing
function makes rustc emit a noalias annotation for that pointer, and that helps
the optimizer in some cases.
What we want is that the compiler sees that the pointers pack and a and
pointers derived from them, can never alias, then it has more freedom to
rewrite the operations in the packing loops. The pack buffer is contiguous so
it's the only choice for passing one of the two arguments as a slice.
Shown to slightly speed up the layout benchmarks for sgemm, not dgemm, on
M1. No effect noticed on x86-64.
A way to get the same effect without a slice would be good for this crate,
like a 'restrict' keyword.
Using a reference type (such as a slice) for either pack or a in the packing function makes rustc emit a noalias annotation for that pointer, and that helps the optimizer in some cases.
What we want is that the compiler sees that the pointers pack and a and pointers derived from them, can never alias, then it has more freedom to rewrite the operations in the packing loops. The pack buffer is contiguous so it's the only choice for passing one of the two arguments as a slice.
Shown to slightly speed up the layout benchmarks for sgemm, not dgemm, on M1. No effect noticed on x86-64.
A way to get the same effect without a slice would be good for this crate, like a 'restrict' keyword.