alpaka-group / llama

A Low-Level Abstraction of Memory Access
https://llama-doc.rtfd.io/
Mozilla Public License 2.0
80 stars 10 forks source link

Rethink AoSoA Lanes parameter #180

Open bernhardmgruber opened 3 years ago

bernhardmgruber commented 3 years ago

The non-type template parameter Lanes of the AoSoA mapping specifies the number of attributes of multiple datums which should be blocked together to form little vectors of Lanes lanes. Although this works in simple cases, it might be the wrong approach.

CPUs typically have a fixed vector register width expressed in bits. depending on the loaded data types, the number of used lanes is actually different. E.g. AVX2 has 256bits registers and will pack 8 floats, but 4 doubles. What should the Lanes parameter be? Should we ideally pack 8 floats and doubles, 4 floats and doubles, or 8 floats and 4 doubles?

We should think about this and consider a redesign.

bernhardmgruber commented 3 years ago

It may also be required to add additional padding to the layout, if the types in the record dimension are not heterogeneous.

bernhardmgruber commented 3 years ago

We have the maxLanes utility in the meantime to compute the maximum number of lanes for a given record dimension for a given number of bits.