This contains build fixes and asm (NEON) optimizations for aarch64, tested on Linux (Raspberry Pi 3) and macOS (Apple Silicon M1 Max).
It uses the the NEON assembly code taken from x264 like the x86 code, additionally the the SSE2 code using intrinsics in mvtools are converted to NEON using sse2neon. This may not be the best performing solution, but it still gives a total speed-up between 2x and 4x in a real world scenarios
This contains build fixes and asm (NEON) optimizations for aarch64, tested on Linux (Raspberry Pi 3) and macOS (Apple Silicon M1 Max). It uses the the NEON assembly code taken from x264 like the x86 code, additionally the the SSE2 code using intrinsics in mvtools are converted to NEON using sse2neon. This may not be the best performing solution, but it still gives a total speed-up between 2x and 4x in a real world scenarios