https://github.com/DakeQQ/Native-LLM-for-Android
您好,推荐一个基于ONNXRuntime的安卓LLM布署项目,使用华为P40能跑出5.2 token/s, 8Gen2能跑8.5 token/s的成绩(q8f32 & 786滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后,速度可能再提升50%.
Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.2 tokens/s on Huawei P40 and 8.5 tokens/s on 8Gen2 (use q8f32 & with a 786 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.
Feature request / 功能建议
https://github.com/DakeQQ/Native-LLM-for-Android
您好,推荐一个基于ONNXRuntime的安卓LLM布署项目,使用华为P40能跑出5.2 token/s, 8Gen2能跑8.5 token/s的成绩(q8f32 & 786滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后,速度可能再提升50%.
Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.2 tokens/s on Huawei P40 and 8.5 tokens/s on 8Gen2 (use q8f32 & with a 786 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.