alpaka-group / llama

A Low-Level Abstraction of Memory Access
https://llama-doc.rtfd.io/
Mozilla Public License 2.0
79 stars 10 forks source link

Improve SoA with static array extents #653

Closed bernhardmgruber closed 1 year ago

bernhardmgruber commented 1 year ago

This PR improves the SoA by precomputing a compile-time only subarray offset table when the array extents are fully static.

This PR also contains an extension of the vectoradd benchmarks:

codecov[bot] commented 1 year ago

Codecov Report

Merging #653 (1ee97e4) into develop (4271b5a) will increase coverage by 0.00%. The diff coverage is 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #653 +/- ## ======================================== Coverage 98.75% 98.75% ======================================== Files 74 74 Lines 6818 6828 +10 ======================================== + Hits 6733 6743 +10 Misses 85 85 ```
bernhardmgruber commented 1 year ago

image

Runtime difference is marginal:

layout runtime
"SoA SB aligned" 0.113894
"SoA SB aligned CT size" 0.113254

compile time: image

runtime:

image

bernhardmgruber commented 1 year ago

image

layout runtime
"SoA SB aligned" 0.160085
"SoA SB aligned CT size" 0.160387

Here, the CT variant is actually a bit slower. But the assembly is better (less instructions).