CPU Performing better than GPU for Yolo Predictions

Hello Team,

I'm having a RISC-V Dev platform which has a IMG-GPU, and I'm able to successfully build the Vulkan-NCNN Framework and performing Yolo object detection what we noticed is We're observing better CPU is taking around ~3.0 seconds for performing Yolo v8 objection detection, while GPU is taking ~6.2 seconds for performing the same object detection.

Can you please help me on this, If there's a way to optimise or improvise the performance rates on GPU over CPU. Also it'll be highly appreciated if you can help me with sharing more details on this issue.

我有一个带有 IMG-GPU 的 RISC-V 开发平台，我能够成功构建 Vulkan-NCNN 框架并执行 Yolo 对象检测，我们注意到我们观察到更好的 CPU 执行 Yolo v8 对象检测大约需要约 3.0 秒，而 GPU 执行相同的对象检测大约需要约 6.2 秒。

你能帮我解决这个问题吗，如果有办法优化或提高 GPU 而不是 CPU 的性能。如果您能帮助我分享有关此问题的更多详细信息，我将不胜感激。

Regards, Ravi Kiran

Tencent / ncnn

CPU Performing better than GPU for Yolo Predictions #5703