alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
http://www.mnn.zone/
8.76k stars 1.67k forks source link

Benchmark 使用 Vulkan运行速度比CPU慢 #1984

Closed Crazod closed 1 year ago

Crazod commented 2 years ago

您好,我测试了下benchmark的程序,在不同的手机平台上,包括高通8Gen1的芯片,Vulkan上我无法得到像xiaomi 6 那么快的速度,请问是我测试的哪里有问题么

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

Android RedmiK50G 电竞版 芯片平台高通: 8gen1

Github版本:

Github Version:

2.0.0 4679f848c45510531976ebdf32c42b1c27b92960 直接下载ZIP包请提供下载日期以及压缩包注释里的git版本(可通过7z l zip包路径命令并在输出信息中搜索Comment 获得,形如Comment = bc80b11110cd440aacdabbf59658d630527a7f2b)。 git clone请提供 git commit 第一行的commit id

Provide date (or better yet, git revision from the comment section of the zip. Obtainable using 7z l PATH/TO/ZIP and search for Comment in the output) if downloading source as zip,otherwise provide the first commit id from the output of git commit

编译方式:

Compiling Method

bash ./bench_android.sh -p -64

请在这里粘贴cmake参数或使用的cmake脚本路径以及完整输出
Paste cmake arguments or path of the build script used here as well as the full log of the cmake proess here or pastebin

编译日志:

Build Log:

粘贴在这里
Paste log here or pastebin

Build Flags: ABI=arm64-v8a OpenMP=ON Vulkan=ON OpenCL=ON MNN benchmark Forward type: CPU thread=4 precision=2 sparsity=0 sparseBlockOC=1 --------> Benchmarking... loop = 10, warmup = 5 [ - ] SqueezeNetV1.0.mnn max = 11.303 ms min = 10.009 ms avg = 10.694 ms [ - ] resnet-v2-50.mnn max = 51.124 ms min = 50.941 ms avg = 51.016 ms [ - ] squeezenetv1.1.mnn max = 6.040 ms min = 5.784 ms avg = 5.862 ms [ - ] nasnet.mnn max = 14.299 ms min = 14.124 ms avg = 14.204 ms [ - ] mobilenetV3.mnn max = 2.423 ms min = 2.230 ms avg = 2.288 ms [ - ] MobileNetV2_224.mnn max = 6.229 ms min = 6.012 ms avg = 6.118 ms [ - ] inception-v3.mnn max = 72.086 ms min = 71.882 ms avg = 71.975 ms [ - ] mobilenet-v1-1.0.mnn max = 9.530 ms min = 9.409 ms avg = 9.465 ms MNN benchmark Forward type: Vulkan thread=4 precision=2 sparsity=0 sparseBlockOC=1 --------> Benchmarking... loop = 10, warmup = 5 [ - ] SqueezeNetV1.0.mnn max = 25.879 ms min = 19.462 ms avg = 23.527 ms [ - ] resnet-v2-50.mnn max = 58.579 ms min = 53.859 ms avg = 56.875 ms [ - ] squeezenetv1.1.mnn max = 18.425 ms min = 13.298 ms avg = 16.367 ms [ - ] nasnet.mnn max = 34.925 ms min = 32.229 ms avg = 33.241 ms [ - ] mobilenetV3.mnn max = 24.522 ms min = 22.062 ms avg = 23.619 ms [ - ] MobileNetV2_224.mnn max = 22.192 ms min = 17.958 ms avg = 20.571 ms [ - ] inception-v3.mnn max = 44.545 ms min = 41.168 ms avg = 43.315 ms [ - ] mobilenet-v1-1.0.mnn max = 22.775 ms min = 17.603 ms avg = 20.118 ms

我这边测了几台手机,只是单纯执行Benchmark,普遍在mobilenet v3上,速度CPU上要快。 且GPU上的速度,无法和官方测试的速度对齐。(这个是Mi-6的结果,高通8gen1应该要更快才对,如mobilenet v2) 下面是官方的结果。 MNN benchmark Forward type: Vulkan thread=4** precision=2 --------> Benchmarking... loop = 50 [ - ] mobilenet-v1-1.0.mnn max = 23.288ms min = 14.335ms avg = 15.017ms [ - ] inception-v3.mnn max = 99.882ms min = 98.799ms avg = 99.276ms [ - ] resnet-v2-50.mnn max = 81.846ms min = 71.969ms avg = 75.207ms [ - ] SqueezeNetV1.0.mnn max = 30.883ms min = 17.155ms avg = 18.295ms [ - ] MobileNetV2_224.mnn max = 24.959ms min = 12.137ms avg = 13.550ms

是我GPU的频率要锁频么,需要什么操作呢

jxt1234 commented 2 years ago

Vulkan Backend 没有 auto-tuning 或者针对不同机型调 local size,计算量较小的模型确实有可能出现高端 GPU 反而更慢的情况的

Crazod commented 2 years ago

Vulkan Backend 没有 auto-tuning 或者针对不同机型调 local size,计算量较小的模型确实有可能出现高端 GPU 反而更慢的情况的

那请问我该如何能跑benchmark 中的mi 6 的数据呢,比如mobilenetv2 只有10ms左右的数据。