Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.
For SPEC CPU work, when measured on an Ampere Altra with gcc-13 -O3, llama-cpp showed +20% and whisper-cpp showed +4%.
Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.
For SPEC CPU work, when measured on an Ampere Altra with gcc-13 -O3, llama-cpp showed +20% and whisper-cpp showed +4%.