Open ohlr opened 3 years ago
Hello
Possible future gpu performance acceleration comes from workgroup size tuning, kernel optimization based on subgroup operation etc.
I planned to implement gpu int8 inference this year, but recently I am working on the new pytorch model conversion tool
We welcome any kind of contribution, if you already have some ideas for gpu optimization, welcome to communicate :D
int8 often comes with loss in accuracy. I think focusing on improving the performance of single operations or layers (if there is room for improvements) would be more beneficial.
detail | 详细描述 | 詳細な説明
We are highly interested in improving GPU performance on Qualcomm GPUs under Android. Is there any roadmap or Work In Progress/PR that we can help to work on?
Currently the docs say:
known work, but speed may not be fast enough
Related articles: https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-vulkan#my-model-runs-slower-on-gpu-than-cpu https://github.com/Tencent/ncnn#supported-platform-matrix