Mesa (as of 22.3.2) uses vanilla calloc/realloc without explicit alignment support.
CLang aggressively optimizes fixed-size memcpy into a series of max-aligned (:128) vectorized loads/stores.
Boom.
Fixed by adding the -fmax-type-align=8 option.
Pre-fix: vld1.64 {d18, d19}, [r0:128]! (and similar stores) emitted.
Post-fix: vld1.64 {d18, d19}, [r0]! and similar stores.
Mesa (as of 22.3.2) uses vanilla calloc/realloc without explicit alignment support. CLang aggressively optimizes fixed-size memcpy into a series of max-aligned (:128) vectorized loads/stores. Boom.
Fixed by adding the
-fmax-type-align=8
option. Pre-fix:vld1.64 {d18, d19}, [r0:128]!
(and similar stores) emitted. Post-fix:vld1.64 {d18, d19}, [r0]!
and similar stores.Further reading: https://discourse.llvm.org/t/over-aligned-vst1-64-and-vld1-64-for-arm-linux-androideabi/47596/3