Maratyszcza / psimd

Portable 128-bit SIMD intrinsics
MIT License
55 stars 32 forks source link

Support portable vector alignment #4

Open rsaxvc opened 4 years ago

rsaxvc commented 4 years ago

Different vector units handle alignment in interesting ways. ARM/NEON supports fixing up unaligned accesses at runtime, or trapping on them depending on the instruction alignment specifier. PowerPC/Altivec however silently loads/stores...somewhere slightly different, which causes all manner of problems if an unaligned address makes it to the Altivec load/store unit.

Currently, GCC9 on PowerPC ignores the Altivec execution unit entirely when using PSIMD, this appears to be because aligned(N) has N set to at most 4 bytes, but the Altivec unit require 16 byte alignment before it can safely load/store vectors. Increasing N to 16 results in AltiVec instructions being used; however, much code using PSIMD isn't written to align its memory. For example, NNPACK fails the convolution tests.

I think this is because the psimd(load|store) family of instructions aren't alignment-aware and support load/store to native C types, which may not be aligned to the requirements of a vector unit.

I'm not sure what the right solution would be, perhaps some combination of: