Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
20.15k stars 4.14k forks source link

使用现在 ncnn 的库比去年 8 月份的 ncnn 库网络推理时间变长很多。。。 #1356

Closed popper0912 closed 4 years ago

popper0912 commented 4 years ago

你好 nihui, 使用现在 ncnn 的库比去年 8 月份的 ncnn 库网络推理时间变长很多。我使用的是 caffe 训练的模型,然后转化为 ncnn 库,进行安卓部署。发现使用新库推理时间加长了好几倍,很郁闷,不知道是哪里的问题,还请大神帮助。

nihui commented 4 years ago

应该是哪里部署有问题 你用的是 release 里预编译的静态库吗?

shineway14 commented 4 years ago

我的也是

onexuan commented 4 years ago

我也对比过,20180830版本开始变慢了

sunbinbin1991 commented 4 years ago

+1,等消息

BUG1989 commented 4 years ago

@popper0912 @onexuan @sunbinbin1991 @shineway14 请各位大佬提供以下信息,方便nihui进行分析:

谢谢~~

SophieChang66 commented 4 years ago

我的也是cortex A53, 20190731的master下的ncnn比去年的慢了,20190731版本的编译加了openmp,之前的版本没有openmp是关掉的(当时不是我编译的,据说开了openmp和没开一样,不能使用多线程,所以关了)

nihui commented 4 years ago

你们几位,编译下最新的master代码跑benchncnn看看呢?

onexuan commented 4 years ago

没对比benchncnn,只对比目前项目的网络,因为实际运行的网络才有对比性,环境arm7,20190908比20180704慢200ms+

qaz734913414 commented 4 years ago

+1,等消息

SophieChang66 commented 4 years ago

@nihui 在cortex A53上,20190908和master_20191105的benchmark耗时如下: 20190908版本 loop_count = 20 num_threads = 1 powersave = 0 gpu_device = -1 squeezenet min = 367.37 max = 372.50 avg = 370.06 squeezenet_int8 min = 248.58 max = 252.11 avg = 249.59 mobilenet min = 468.06 max = 472.17 avg = 470.06 mobilenet_int8 min = 404.65 max = 497.18 avg = 430.71 mobilenet_v2 min = 808.40 max = 816.33 avg = 810.73 mobilenet_v3 min = 539.69 max = 543.54 avg = 541.43 shufflenet min = 374.77 max = 377.58 avg = 376.14 mnasnet min = 516.65 max = 519.84 avg = 518.36 proxylessnasnet min = 824.25 max = 827.05 avg = 826.02 googlenet min = 1092.32 max = 1332.60 avg = 1147.07 googlenet_int8 min = 812.46 max = 989.07 avg = 976.71 resnet18 min = 889.95 max = 893.65 avg = 891.96 resnet18_int8 min = 911.78 max = 917.99 avg = 915.17 alexnet min = 1707.62 max = 2038.28 avg = 1882.43 master_20191105: loop_count = 20 num_threads = 1 powersave = 0 gpu_device = -1 squeezenet min = 202.50 max = 206.68 avg = 204.41 squeezenet_int8 min = 249.73 max = 254.57 avg = 252.17 mobilenet min = 308.42 max = 313.81 avg = 310.99 mobilenet_int8 min = 410.07 max = 415.68 avg = 412.29 mobilenet_v2 min = 220.95 max = 224.73 avg = 222.38 mobilenet_v3 min = 194.93 max = 197.22 avg = 196.19 shufflenet min = 142.34 max = 145.66 avg = 144.09 shufflenet_v2 min = 165.68 max = 167.96 avg = 166.53 mnasnet min = 211.42 max = 214.96 avg = 213.30 proxylessnasnet min = 270.10 max = 273.17 avg = 271.71 googlenet min = 840.55 max = 847.38 avg = 843.26 googlenet_int8 min = 807.18 max = 831.52 avg = 811.44 resnet18 min = 660.73 max = 664.10 avg = 662.01 resnet18_int8 min = 759.34 max = 760.61 avg = 760.03 alexnet min = 880.08 max = 1064.36 avg = 1018.88 有很多模型差异很大,都是在板子空闲状态下,前后分别跑的.

nihui commented 4 years ago

@nihui 在cortex A53上,20190908和master_20191105的benchmark耗时如下: 20190908版本 loop_count = 20 num_threads = 1 powersave = 0 gpu_device = -1 squeezenet min = 367.37 max = 372.50 avg = 370.06 squeezenet_int8 min = 248.58 max = 252.11 avg = 249.59 mobilenet min = 468.06 max = 472.17 avg = 470.06 mobilenet_int8 min = 404.65 max = 497.18 avg = 430.71 mobilenet_v2 min = 808.40 max = 816.33 avg = 810.73 mobilenet_v3 min = 539.69 max = 543.54 avg = 541.43 shufflenet min = 374.77 max = 377.58 avg = 376.14 mnasnet min = 516.65 max = 519.84 avg = 518.36 proxylessnasnet min = 824.25 max = 827.05 avg = 826.02 googlenet min = 1092.32 max = 1332.60 avg = 1147.07 googlenet_int8 min = 812.46 max = 989.07 avg = 976.71 resnet18 min = 889.95 max = 893.65 avg = 891.96 resnet18_int8 min = 911.78 max = 917.99 avg = 915.17 alexnet min = 1707.62 max = 2038.28 avg = 1882.43 master_20191105: loop_count = 20 num_threads = 1 powersave = 0 gpu_device = -1 squeezenet min = 202.50 max = 206.68 avg = 204.41 squeezenet_int8 min = 249.73 max = 254.57 avg = 252.17 mobilenet min = 308.42 max = 313.81 avg = 310.99 mobilenet_int8 min = 410.07 max = 415.68 avg = 412.29 mobilenet_v2 min = 220.95 max = 224.73 avg = 222.38 mobilenet_v3 min = 194.93 max = 197.22 avg = 196.19 shufflenet min = 142.34 max = 145.66 avg = 144.09 shufflenet_v2 min = 165.68 max = 167.96 avg = 166.53 mnasnet min = 211.42 max = 214.96 avg = 213.30 proxylessnasnet min = 270.10 max = 273.17 avg = 271.71 googlenet min = 840.55 max = 847.38 avg = 843.26 googlenet_int8 min = 807.18 max = 831.52 avg = 811.44 resnet18 min = 660.73 max = 664.10 avg = 662.01 resnet18_int8 min = 759.34 max = 760.61 avg = 760.03 alexnet min = 880.08 max = 1064.36 avg = 1018.88 有很多模型差异很大,都是在板子空闲状态下,前后分别跑的.

那么现在的 master 和 20180704 比是不是更快了呢...

SophieChang66 commented 4 years ago

@nihui 20180704版本的benchmark如下: loop_count = 20 num_threads = 1 powersave = 0 squeezenet min = 243.62 max = 246.16 avg = 244.87 mobilenet min = 399.12 max = 400.93 avg = 400.00 mobilenet_v2 min = 341.33 max = 346.81 avg = 344.55 shufflenet min = 162.92 max = 166.17 avg = 164.35 googlenet min = 996.89 max = 1000.29 avg = 998.58 resnet18 min = 1066.68 max = 1071.64 avg = 1068.82 alexnet min = 1676.75 max = 1678.85 avg = 1677.91 现在的master在benchmark上,相同模型耗时比20180704和20190908的都少。不同的版本耗时差异很大,这些版本有哪些本质上不同的呢?没有对比过源码~

nihui commented 4 years ago

https://github.com/Tencent/ncnn/releases/tag/20191113 最新版本应该比以前的都快了