Closed divinity76 closed 3 months ago
I see a ~1% improvement on the Graviton2 CPU on my AWS instance too. Thanks!
Released as part of v1.5.1.
I wonder if this might have made big endian work too 🤔 (doesn't really matter, nothing runs big endian)
vld1q_u8 and vst1q_u8 has no alignment requirements.
This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)