Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
20.15k stars 4.14k forks source link

ncnn在windows上的速度差异 #1695

Closed Amanda-Barbara closed 4 years ago

Amanda-Barbara commented 4 years ago

@nihui 您好,我使用您给的https://github.com/nihui/ncnn-assets/blob/master/ncnntest-20190323.zip官方代码跑出来结果: [0 GeForce RTX 2080] queueC=2 queueT=1 memU=4294967295 memDL=7 memHV=9 [0 GeForce RTX 2080] fp16s=1 fp16a=1 int8s=1 int8a=1 loop_count = 40 num_threads = 12 powersave = 0 gpu_device = 0 squeezenet min = 1.75 max = 4.63 avg = 2.37 mobilenet min = 2.24 max = 3.04 avg = 2.82 mobilenet_v2 min = 2.57 max = 3.61 avg = 3.01 shufflenet min = 2.36 max = 3.19 avg = 2.83 mnasnet min = 2.34 max = 3.49 avg = 2.99 proxylessnasnet min = 3.30 max = 3.69 avg = 3.46 googlenet min = 4.59 max = 6.20 avg = 5.26 resnet18 min = 3.95 max = 4.95 avg = 4.14 alexnet min = 2.61 max = 3.10 avg = 2.86 vgg16 min = 16.16 max = 17.12 avg = 16.59 resnet50 min = 7.42 max = 8.95 avg = 8.19 squeezenet-ssd min = 12.97 max = 14.58 avg = 14.32 mobilenet-ssd min = 5.59 max = 6.76 avg = 6.39 mobilenet-yolo min = 9.68 max = 10.40 avg = 10.09 mobilenet-yolov3 min = 7.80 max = 8.55 avg = 8.23 请按任意键继续. . .

然而我使用cmake在windows10上自己编译,使用vs2019打开运行benchmark.exe跑出来的结果如下: [0 GeForce RTX 2080] queueC=2[8] queueG=0[16] queueT=1[2] [0 GeForce RTX 2080] buglssc=0 bugihfa=0 [0 GeForce RTX 2080] fp16p=1 fp16s=1 fp16a=1 int8s=1 int8a=1 loop_count = 40 num_threads = 12 powersave = 0 gpu_device = 0 cooling_down = 1 squeezenet min = 1.50 max = 8.49 avg = 4.81 mobilenet min = 4.41 max = 8.36 avg = 5.96 mobilenet_v2 min = 2.07 max = 11.82 avg = 6.34 mobilenet_v3 min = 2.56 max = 14.76 avg = 5.02 shufflenet min = 1.61 max = 7.82 avg = 5.54 shufflenet_v2 min = 2.17 max = 11.08 avg = 6.42 mnasnet min = 2.15 max = 12.86 avg = 5.33 proxylessnasnet min = 2.26 max = 13.34 avg = 4.59 googlenet min = 4.45 max = 20.23 avg = 6.70 resnet18 min = 2.35 max = 14.99 avg = 4.59 alexnet min = 2.32 max = 19.27 avg = 5.00 vgg16 min = 7.02 max = 8.60 avg = 7.61 resnet50 min = 4.15 max = 4.94 avg = 4.77 squeezenet_ssd min = 4.80 max = 5.73 avg = 5.31 mobilenet_ssd min = 3.20 max = 18.02 avg = 4.73 mobilenet_yolo min = 3.58 max = 3.74 avg = 3.66 mobilenetv2_yolov3 min = 3.08 max = 14.16 avg = 7.00

这两个结果的速度差异比较大,想向您请教一下这是什么原因?谢谢,

nihui commented 4 years ago

ncnntest-20190323.zip 这是去年的版本,现在已经优化很多了,建议用最新的代码