Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
19.71k stars 4.1k forks source link

Roadmap: Qualcomm GPU optimizations #3210

Open ohlr opened 2 years ago

ohlr commented 2 years ago

detail | 详细描述 | 詳細な説明

We are highly interested in improving GPU performance on Qualcomm GPUs under Android. Is there any roadmap or Work In Progress/PR that we can help to work on?

Currently the docs say: known work, but speed may not be fast enough

Related articles: https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-vulkan#my-model-runs-slower-on-gpu-than-cpu https://github.com/Tencent/ncnn#supported-platform-matrix

nihui commented 2 years ago

Hello

Possible future gpu performance acceleration comes from workgroup size tuning, kernel optimization based on subgroup operation etc.

I planned to implement gpu int8 inference this year, but recently I am working on the new pytorch model conversion tool

We welcome any kind of contribution, if you already have some ideas for gpu optimization, welcome to communicate :D

ohlr commented 2 years ago

int8 often comes with loss in accuracy. I think focusing on improving the performance of single operations or layers (if there is room for improvements) would be more beneficial.