KyleVaughn / UM2

An unstructured mesh library for automated method of characteristic mesh generation
https://univeristy-of-michigan-unstructured-mesh-code.readthedocs.io/en/main/index.html#
MIT License
7 stars 2 forks source link

Implement/check proper treatment of over-aligned Vec and Vector data #154

Closed KyleVaughn closed 4 months ago

KyleVaughn commented 5 months ago

The Vec<D, T> class typically uses an unaligned array T[D] to store its data. However, when UM2_ENABLE_SIMD_VEC is on, if D is a power of 2 and T is an arithmetic type, then GCC vector extensions are used as the underlying storage instead. This enables very nice SIMD optimizations on Vec. It also increases its alignment from sizeof(T) to D * sizeof(T). See https://godbolt.org/z/or73xrxbh.

However, in Vector<T> , we allocate memory to store T using (1) https://en.cppreference.com/w/cpp/memory/new/operator_new. It is unclear whether this memory will be appropriately aligned, since we do not explicitly request an alignment. Therefore, when using over-aligned types or GCC vector extensions we want to verify that the memory, access to the memory, and related pointers are appropriately aligned.

Failure to properly align will result in undefined behavior, reads that are incorrect, and likely segfaults.

Tasks related to this issue are:

template< Int D, class T> static consteval auto vecAlignment() noexcept -> Int { if constexpr (isPowerOf2(D) && std::is_arithmetic_v) { return D * sizeof(T); } else { return alignof(T[D]); } };

template <Int D, class T> class Vec {

using Data = typename VecData<D, T>::Data; alignas(vecAlignment<D, T>()) Data _data; ... };



- [x] Investigate usage of `new`  and `delete` in `Vector` and ensure that all pointers use properly aligned memory for over-aligned types. It should be sufficient to check `addressof(pointer) % alignof(T) == 0`

A potential add-on task:
- [x] When `T` is not an arithmetic type, but the underlying representation still maps to a SIMD vector, investigate usage of that SIMD vector as the storage. Example: `Vec<2, Vec<4, double>>` can be stored as `__m512`. When  `UM2_ENABLE_SIMD_VEC` is off and the storage is aligned, clang18 is able to perform optimizations like this, but gcc14 is not. Testing addition of two `Vec<2, Vec<4, double>>` shows a single 512-bit add for aligned array storage in clang18, but two 256-bit adds when using GCC vector extensions.
KyleVaughn commented 4 months ago

Implemented in "format" branch, which will be merged into main in the next few days.