ggerganov / ggml

Tensor library for machine learning
MIT License
11.26k stars 1.05k forks source link

Loop tiling optimizations for scalar path #898

Closed heshpdx closed 4 months ago

heshpdx commented 4 months ago

Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.

For SPEC CPU work, when measured on an Ampere Altra with gcc-13 -O3, llama-cpp showed +20% and whisper-cpp showed +4%.