Different vector units handle alignment in interesting ways. ARM/NEON supports fixing up unaligned accesses at runtime, or trapping on them depending on the instruction alignment specifier. PowerPC/Altivec however silently loads/stores...somewhere slightly different, which causes all manner of problems if an unaligned address makes it to the Altivec load/store unit.
Currently, GCC9 on PowerPC ignores the Altivec execution unit entirely when using PSIMD, this appears to be because aligned(N) has N set to at most 4 bytes, but the Altivec unit require 16 byte alignment before it can safely load/store vectors. Increasing N to 16 results in AltiVec instructions being used; however, much code using PSIMD isn't written to align its memory. For example, NNPACK fails the convolution tests.
I think this is because the psimd(load|store) family of instructions aren't alignment-aware and support load/store to native C types, which may not be aligned to the requirements of a vector unit.
I'm not sure what the right solution would be, perhaps some combination of:
adding a platform-recommended alignment so api callers can align their buffers to use
adding some load/store aligned functions
fixing up the current load/store functions to handle unaligned accesses in software on platforms that require it
Different vector units handle alignment in interesting ways. ARM/NEON supports fixing up unaligned accesses at runtime, or trapping on them depending on the instruction alignment specifier. PowerPC/Altivec however silently loads/stores...somewhere slightly different, which causes all manner of problems if an unaligned address makes it to the Altivec load/store unit.
Currently, GCC9 on PowerPC ignores the Altivec execution unit entirely when using PSIMD, this appears to be because aligned(N) has N set to at most 4 bytes, but the Altivec unit require 16 byte alignment before it can safely load/store vectors. Increasing N to 16 results in AltiVec instructions being used; however, much code using PSIMD isn't written to align its memory. For example, NNPACK fails the convolution tests.
I think this is because the psimd(load|store) family of instructions aren't alignment-aware and support load/store to native C types, which may not be aligned to the requirements of a vector unit.
I'm not sure what the right solution would be, perhaps some combination of: