OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Apache License 2.0
6.95k stars 440 forks source link

[Feature Request]: 安卓布署方案 Android deployment #114

Closed DakeQQ closed 2 months ago

DakeQQ commented 5 months ago

Feature request / 功能建议

https://github.com/DakeQQ/Native-LLM-for-Android
您好,推荐一个基于ONNXRuntime的安卓LLM布署项目,使用华为P40能跑出5.2 token/s, 8Gen2能跑8.5 token/s的成绩(q8f32 & 786滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后,速度可能再提升50%.
Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.2 tokens/s on Huawei P40 and 8.5 tokens/s on 8Gen2 (use q8f32 & with a 786 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.

jctime commented 5 months ago

cool,我们看下,之后可以加到社区方案,谢谢贡献