The Vec<D, T> class typically uses an unaligned array T[D] to store its data. However, when UM2_ENABLE_SIMD_VEC is on, if D is a power of 2 and T is an arithmetic type, then GCC vector extensions are used as the underlying storage instead. This enables very nice SIMD optimizations on Vec. It also increases its alignment from sizeof(T) to D * sizeof(T). See
https://godbolt.org/z/or73xrxbh.
However, in Vector<T> , we allocate memory to store T using (1) https://en.cppreference.com/w/cpp/memory/new/operator_new. It is unclear whether this memory will be appropriately aligned, since we do not explicitly request an alignment. Therefore, when using over-aligned types or GCC vector extensions we want to verify that the memory, access to the memory, and related pointers are appropriately aligned.
Failure to properly align will result in undefined behavior, reads that are incorrect, and likely segfaults.
Tasks related to this issue are:
[x] When UM2_ENABLE_SIMD_VEC is off, ensure that T[D] is still aligned for types which map to SIMD vectors. Use something like
template< Int D, class T>
static consteval auto
vecAlignment() noexcept -> Int
{
if constexpr (isPowerOf2(D) && std::is_arithmetic_v) {
return D * sizeof(T);
} else {
return alignof(T[D]);
}
};
template <Int D, class T>
class Vec
{
using Data = typename VecData<D, T>::Data;
alignas(vecAlignment<D, T>()) Data _data;
...
};
- [x] Investigate usage of `new` and `delete` in `Vector` and ensure that all pointers use properly aligned memory for over-aligned types. It should be sufficient to check `addressof(pointer) % alignof(T) == 0`
A potential add-on task:
- [x] When `T` is not an arithmetic type, but the underlying representation still maps to a SIMD vector, investigate usage of that SIMD vector as the storage. Example: `Vec<2, Vec<4, double>>` can be stored as `__m512`. When `UM2_ENABLE_SIMD_VEC` is off and the storage is aligned, clang18 is able to perform optimizations like this, but gcc14 is not. Testing addition of two `Vec<2, Vec<4, double>>` shows a single 512-bit add for aligned array storage in clang18, but two 256-bit adds when using GCC vector extensions.
The
Vec<D, T>
class typically uses an unaligned arrayT[D]
to store its data. However, whenUM2_ENABLE_SIMD_VEC
is on, ifD
is a power of 2 andT
is an arithmetic type, then GCC vector extensions are used as the underlying storage instead. This enables very nice SIMD optimizations onVec
. It also increases its alignment fromsizeof(T)
toD * sizeof(T)
. See https://godbolt.org/z/or73xrxbh.However, in
Vector<T>
, we allocate memory to storeT
using (1) https://en.cppreference.com/w/cpp/memory/new/operator_new. It is unclear whether this memory will be appropriately aligned, since we do not explicitly request an alignment. Therefore, when using over-aligned types or GCC vector extensions we want to verify that the memory, access to the memory, and related pointers are appropriately aligned.Failure to properly align will result in undefined behavior, reads that are incorrect, and likely segfaults.
Tasks related to this issue are:
UM2_ENABLE_SIMD_VEC
is off, ensure thatT[D]
is still aligned for types which map to SIMD vectors. Use something liketemplate< Int D, class T> static consteval auto vecAlignment() noexcept -> Int { if constexpr (isPowerOf2(D) && std::is_arithmetic_v) {
return D * sizeof(T);
} else {
return alignof(T[D]);
}
};
template <Int D, class T> class Vec {
using Data = typename VecData<D, T>::Data; alignas(vecAlignment<D, T>()) Data _data; ... };