I.e. we compare the span length with 2 * vector_length, so that the vector loop isn't entered unless there will be at least two loops.
(NB. that same logic can be achieved with (s.Length > Vector<double>.Count)
Questions:
Is this pattern actually better performing? It introduces the exra shift op in all scenarions (minor point?)
I seem to recall there are places where it makes sense to do a minim of two loops. What are those places, and are they properly commented, and let's also document the places that use '<< 1' where it's just done as a micro-op.
Separately we also do this:
int width = Vector<double>.Count;
And from then on use the local 'width' variable. At time of writing, changing the code to use Vector.Count has substantially different x86 codegen (according to sharplab swebsite), perhaps to avoid switching between vector and scalar CPU instructions(?). It might be worth fine tuning the performance in .NET 7 (i.e. using up-to-date code gen, as codegen is changing all the time).
Sometimes we do:
I.e. we compare the span length with 2 * vector_length, so that the vector loop isn't entered unless there will be at least two loops.
(NB. that same logic can be achieved with
(s.Length > Vector<double>.Count)
Questions:
Separately we also do this:
And from then on use the local 'width' variable. At time of writing, changing the code to use Vector.Count has substantially different x86 codegen (according to sharplab swebsite), perhaps to avoid switching between vector and scalar CPU instructions(?). It might be worth fine tuning the performance in .NET 7 (i.e. using up-to-date code gen, as codegen is changing all the time).