Open imatrisciano opened 3 weeks ago
Just because I was curious about the compiler flag changes, I asked GPT-4o for more details, which you can find at https://chatgpt.com/share/67343ef3-8ae4-8003-9c41-82ffa7cf7f5a.
Thanks for working on LM Playground! ❤️
This PR introduces a couple of simple compiler flags that can greatly improve inference speed
As described in section 5.1.2 of the paper Jie Xiao, Qianyi Huang, Xu Chen, Chen Tian. Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation, arXiv:2410.03613, the flag
i8mm
has been added to the architecture description for arm64-v8a processors. This flag supposedly enables the generation of machine instructions optimised for int8 math.The flag
-Ofast
has been specified in CMakeLists.txt to enable compiler optimisations for any architecture. This change requires the flag-fno-finite-math-only
to be specified so that we disable all the optimisations based on the assumption that floating point math cannot result in infinite.With those changes, I was able to observe great performance improvements on my device (Motorola Edge 20) when using Llama3.2-1B-Q4K_M: