Open bernhardmgruber opened 3 years ago
It may also be required to add additional padding to the layout, if the types in the record dimension are not heterogeneous.
We have the maxLanes
utility in the meantime to compute the maximum number of lanes for a given record dimension for a given number of bits.
The non-type template parameter
Lanes
of theAoSoA
mapping specifies the number of attributes of multiple datums which should be blocked together to form little vectors ofLanes
lanes. Although this works in simple cases, it might be the wrong approach.CPUs typically have a fixed vector register width expressed in bits. depending on the loaded data types, the number of used lanes is actually different. E.g. AVX2 has 256bits registers and will pack 8 floats, but 4 doubles. What should the
Lanes
parameter be? Should we ideally pack 8 floats and doubles, 4 floats and doubles, or 8 floats and 4 doubles?We should think about this and consider a redesign.