[Feature Request]: 安卓布署方案 Android deployment

Feature request / 功能建议

https://github.com/DakeQQ/Native-LLM-for-Android
您好，推荐一个基于ONNXRuntime的安卓LLM布署项目，使用华为P40能跑出5.2 token/s, 8Gen2能跑8.5 token/s的成绩（q8f32 & 786滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后，速度可能再提升50%.
Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.2 tokens/s on Huawei P40 and 8.5 tokens/s on 8Gen2 (use q8f32 & with a 786 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.

OpenBMB / MiniCPM

[Feature Request]: 安卓布署方案 Android deployment #114

Feature request / 功能建议