This PR adds basic support for a 128-bit SVE implementation, which is implemented as an enhanced version of the NEON SIMD library rather than as a dedicated pure SVE implementation.
Note that the NEON code was added as part of the 256-bit SVE port, so this change is purely build infrastructure to expose the new build type with the correct compiler settings.
On Neoverse V2 performance is 5% better for 4x4 blocks, improving to 10% better for 12x12 blocks. This is 1/3 from compiler-injected code, and 2/3 from using native gathers.
This PR adds basic support for a 128-bit SVE implementation, which is implemented as an enhanced version of the NEON SIMD library rather than as a dedicated pure SVE implementation.
Note that the NEON code was added as part of the 256-bit SVE port, so this change is purely build infrastructure to expose the new build type with the correct compiler settings.
On Neoverse V2 performance is 5% better for 4x4 blocks, improving to 10% better for 12x12 blocks. This is 1/3 from compiler-injected code, and 2/3 from using native gathers.