Closed HaoLiuHust closed 4 years ago
而且静态编译的MNN比动态的MNN慢,满了接近8倍
Sort by node name !
Node Name Op Type Avg(ms) % Flops Rate
Convolution268 Convolution 1.829740 1.019491 0.203209
reshapepre_fc1 Reshape 0.068730 0.038295 0.000379
_plus0 Eltwise 0.058190 0.032422 0.003028
_plus1 Eltwise 0.087970 0.049015 0.003028
_plus10 Eltwise 0.016540 0.009216 0.000757
_plus11 Eltwise 0.016100 0.008971 0.000757
_plus12 Eltwise 0.016010 0.008920 0.000757
_plus13 Eltwise 0.017420 0.009706 0.000757
_plus14 Eltwise 0.016840 0.009383 0.000757
_plus15 Eltwise 0.016280 0.009071 0.000757
_plus16 Eltwise 0.018290 0.010191 0.000757
_plus17 Eltwise 0.015580 0.008681 0.000757
_plus18 Eltwise 0.018670 0.010403 0.000757
_plus19 Eltwise 0.019170 0.010681 0.000757
_plus2 Eltwise 0.087990 0.049026 0.003028
_plus20 Eltwise 0.015940 0.008881 0.000757
_plus21 Eltwise 0.005740 0.003198 0.000379
_plus22 Eltwise 0.013490 0.007516 0.000379
_plus23 Eltwise 0.011680 0.006508 0.000379
_plus3 Eltwise 0.013320 0.007422 0.001514
_plus4 Eltwise 0.017540 0.009773 0.001514
_plus5 Eltwise 0.016500 0.009193 0.001514
_plus6 Eltwise 0.015330 0.008542 0.001514
_plus7 Eltwise 0.011960 0.006664 0.000757
_plus8 Eltwise 0.016510 0.009199 0.000757
_plus9 Eltwise 0.016850 0.009388 0.000757
bn1 Scale 0.006940 0.003867 0.000379
bn1_scale Scale 0.005040 0.002808 0.000379
conv0 Convolution 0.587150 0.327147 0.342915
fc1 Scale 0.003530 0.001967 0.000008
fc1_scale Scale 0.002710 0.001510 0.000008
pre_fc1 Reshape 0.003990 0.002223 0.000008
relu0 PReLU 0.283760 0.158105 0.012112
stage1_unit1_bn1 Scale 0.240850 0.134196 0.012112
stage1_unit1_bn1_scale Scale 1.126860 0.627862 0.012112
stage1_unit1_conv1 Convolution 12.684140 7.067326 7.315514
stage1_unit1_conv1sc Convolution 5.140760 2.864319 0.203209
stage1_unit1_conv2 Convolution 11.250790 6.268694 1.828878
stage1_unit1_relu1 PReLU 0.327200 0.182309 0.012112
stage1_unit1avg_pool Pooling 0.254120 0.141590 0.012112
stage1_unit2_bn1 Scale 0.027320 0.015222 0.003028
stage1_unit2_bn1_scale Scale 0.023350 0.013010 0.003028
stage1_unit2_conv1 Convolution 11.109171 6.189787 1.828878
stage1_unit2_conv2 Convolution 11.435441 6.371577 1.828878
stage1_unit2_relu1 PReLU 0.047960 0.026722 0.003028
stage1_unit3_bn1 Scale 0.027000 0.015044 0.003028
stage1_unit3_bn1_scale Scale 0.019880 0.011077 0.003028
stage1_unit3_conv1 Convolution 8.613811 4.799427 1.828878
stage1_unit3_conv2 Convolution 9.632560 5.367052 1.828878
stage1_unit3_relu1 PReLU 0.048150 0.026828 0.003028
stage2_unit1_bn1 Scale 0.048320 0.026923 0.003028
stage2_unit1_bn1_scale Scale 0.040930 0.022805 0.003028
stage2_unit1_conv1 Convolution 15.094730 8.410453 3.657757
stage2_unit1_conv1sc Convolution 0.093130 0.051890 0.101604
stage2_unit1_conv2 Convolution 1.515590 0.844454 1.828878
stage2_unit1_relu1 PReLU 0.148600 0.082797 0.006056
stage2_unit1avg_pool Pooling 0.061890 0.034484 0.003028
stage2_unit2_bn1 Scale 0.013110 0.007305 0.001514
stage2_unit2_bn1_scale Scale 0.009240 0.005148 0.001514
stage2_unit2_conv1 Convolution 1.158500 0.645491 1.828878
stage2_unit2_conv2 Convolution 1.158840 0.645680 1.828878
stage2_unit2_relu1 PReLU 0.013400 0.007466 0.001514
stage2_unit3_bn1 Scale 0.013740 0.007656 0.001514
stage2_unit3_bn1_scale Scale 0.009280 0.005171 0.001514
stage2_unit3_conv1 Convolution 1.021800 0.569325 1.828878
stage2_unit3_conv2 Convolution 0.829330 0.462085 1.828878
stage2_unit3_relu1 PReLU 0.015160 0.008447 0.001514
stage2_unit4_bn1 Scale 0.013440 0.007488 0.001514
stage2_unit4_bn1_scale Scale 0.009310 0.005187 0.001514
stage2_unit4_conv1 Convolution 1.036910 0.577744 1.828878
stage2_unit4_conv2 Convolution 0.829560 0.462213 1.828878
stage2_unit4_relu1 PReLU 0.013390 0.007461 0.001514
stage3_unit10_bn1 Scale 0.011300 0.006296 0.000757
stage3_unit10_bn1_scale Scale 0.006100 0.003399 0.000757
stage3_unit10_conv1 Convolution 3.228161 1.798661 1.828878
stage3_unit10_conv2 Convolution 1.084480 0.604248 1.828878
stage3_unit10_relu1 PReLU 0.014450 0.008051 0.000757
stage3_unit11_bn1 Scale 0.012510 0.006970 0.000757
stage3_unit11_bn1_scale Scale 0.008030 0.004474 0.000757
stage3_unit11_conv1 Convolution 2.979360 1.660034 1.828878
stage3_unit11_conv2 Convolution 1.078160 0.600727 1.828878
stage3_unit11_relu1 PReLU 0.011830 0.006591 0.000757
stage3_unit12_bn1 Scale 0.011010 0.006135 0.000757
stage3_unit12_bn1_scale Scale 0.006110 0.003404 0.000757
stage3_unit12_conv1 Convolution 2.956520 1.647308 1.828878
stage3_unit12_conv2 Convolution 2.371480 1.321337 1.828878
stage3_unit12_relu1 PReLU 0.012350 0.006881 0.000757
stage3_unit13_bn1 Scale 0.011800 0.006575 0.000757
stage3_unit13_bn1_scale Scale 0.006320 0.003521 0.000757
stage3_unit13_conv1 Convolution 3.024860 1.685386 1.828878
stage3_unit13_conv2 Convolution 1.122180 0.625254 1.828878
stage3_unit13_relu1 PReLU 0.012330 0.006870 0.000757
stage3_unit14_bn1 Scale 0.012850 0.007160 0.000757
stage3_unit14_bn1_scale Scale 0.009150 0.005098 0.000757
stage3_unit14_conv1 Convolution 3.271299 1.822696 1.828878
stage3_unit14_conv2 Convolution 1.403420 0.781955 1.828878
stage3_unit14_relu1 PReLU 0.012950 0.007215 0.000757
stage3_unit1_bn1 Scale 0.014060 0.007834 0.001514
stage3_unit1_bn1_scale Scale 0.009940 0.005538 0.001514
stage3_unit1_conv1 Convolution 1.731010 0.964481 3.657757
stage3_unit1_conv1sc Convolution 0.099260 0.055305 0.101604
stage3_unit1_conv2 Convolution 1.478740 0.823922 1.828878
stage3_unit1_relu1 PReLU 0.035570 0.019819 0.003028
stage3_unit1avg_pool Pooling 0.032860 0.018309 0.001514
stage3_unit2_bn1 Scale 0.011880 0.006619 0.000757
stage3_unit2_bn1_scale Scale 0.010340 0.005761 0.000757
stage3_unit2_conv1 Convolution 1.854600 1.033342 1.828878
stage3_unit2_conv2 Convolution 1.063070 0.592319 1.828878
stage3_unit2_relu1 PReLU 0.012150 0.006770 0.000757
stage3_unit3_bn1 Scale 0.010870 0.006057 0.000757
stage3_unit3_bn1_scale Scale 0.005990 0.003337 0.000757
stage3_unit3_conv1 Convolution 2.308070 1.286006 1.828878
stage3_unit3_conv2 Convolution 2.067350 1.151882 1.828878
stage3_unit3_relu1 PReLU 0.013030 0.007260 0.000757
stage3_unit4_bn1 Scale 0.011800 0.006575 0.000757
stage3_unit4_bn1_scale Scale 0.006460 0.003599 0.000757
stage3_unit4_conv1 Convolution 2.924601 1.629523 1.828878
stage3_unit4_conv2 Convolution 1.062970 0.592264 1.828878
stage3_unit4_relu1 PReLU 0.012940 0.007210 0.000757
stage3_unit5_bn1 Scale 0.011570 0.006447 0.000757
stage3_unit5_bn1_scale Scale 0.006980 0.003889 0.000757
stage3_unit5_conv1 Convolution 2.585950 1.440835 1.828878
stage3_unit5_conv2 Convolution 1.090170 0.607419 1.828878
stage3_unit5_relu1 PReLU 0.013270 0.007394 0.000757
stage3_unit6_bn1 Scale 0.010870 0.006057 0.000757
stage3_unit6_bn1_scale Scale 0.006060 0.003376 0.000757
stage3_unit6_conv1 Convolution 3.356850 1.870364 1.828878
stage3_unit6_conv2 Convolution 1.066040 0.593974 1.828878
stage3_unit6_relu1 PReLU 0.013310 0.007416 0.000757
stage3_unit7_bn1 Scale 0.011520 0.006419 0.000757
stage3_unit7_bn1_scale Scale 0.006120 0.003410 0.000757
stage3_unit7_conv1 Convolution 3.209851 1.788458 1.828878
stage3_unit7_conv2 Convolution 1.085460 0.604795 1.828878
stage3_unit7_relu1 PReLU 0.012490 0.006959 0.000757
stage3_unit8_bn1 Scale 0.012010 0.006692 0.000757
stage3_unit8_bn1_scale Scale 0.007680 0.004279 0.000757
stage3_unit8_conv1 Convolution 2.608930 1.453639 1.828878
stage3_unit8_conv2 Convolution 1.305220 0.727240 1.828878
stage3_unit8_relu1 PReLU 0.012800 0.007132 0.000757
stage3_unit9_bn1 Scale 0.011220 0.006252 0.000757
stage3_unit9_bn1_scale Scale 0.006880 0.003833 0.000757
stage3_unit9_conv1 Convolution 3.109170 1.732362 1.828878
stage3_unit9_conv2 Convolution 1.094480 0.609820 1.828878
stage3_unit9_relu1 PReLU 0.012950 0.007215 0.000757
stage4_unit1_bn1 Scale 0.012410 0.006915 0.000757
stage4_unit1_bn1_scale Scale 0.005760 0.003209 0.000757
stage4_unit1_conv1 Convolution 2.859420 1.593206 3.657757
stage4_unit1_conv1sc Convolution 0.088930 0.049550 0.101604
stage4_unit1_conv2 Convolution 2.290880 1.276428 1.828878
stage4_unit1_relu1 PReLU 0.027950 0.015573 0.001514
stage4_unit1avg_pool Pooling 0.016210 0.009032 0.000757
stage4_unit2_bn1 Scale 0.007860 0.004379 0.000379
stage4_unit2_bn1_scale Scale 0.005310 0.002959 0.000379
stage4_unit2_conv1 Convolution 8.628659 4.807700 1.828878
stage4_unit2_conv2 Convolution 2.033760 1.133167 1.828878
stage4_unit2_relu1 PReLU 0.010370 0.005778 0.000379
stage4_unit3_bn1 Scale 0.010110 0.005633 0.000379
stage4_unit3_bn1_scale Scale 0.006240 0.003477 0.000379
stage4_unit3_conv1 Convolution 4.812270 2.681292 1.828878
stage4_unit3_conv2 Convolution 2.022430 1.126854 1.828878
stage4_unit3_relu1 PReLU 0.010300 0.005739 0.000379
Sort by time cost !
Node Type Avg(ms) % Called times Flops Rate
Reshape 0.072720 0.040518 2.000000 0.000386
Pooling 0.365080 0.203415 4.000000 0.017411
Eltwise 0.559910 0.311970 24.000000 0.026874
PReLU 1.148671 0.640014 25.000000 0.056019
Scale 1.955981 1.089830 52.000000 0.077988
Convolution 175.379929 97.717857 54.000000 99.814148
total time : 179.475815 ms, total mflops : 6321.113770
main, 112, cost time: 17968.628906 ms
上面是速度快的,下面是速度慢的,二者的结构是一模一样得,profile显示mflops也一模一样, 但是慢的这个卷积慢了很多。
Sort by node name !
Node Name Op Type Avg(ms) % Flops Rate
Convolution268 Convolution 1.869860 0.437601 0.203209
reshapepre_fc1 Reshape 0.068480 0.016026 0.000379
_plus0 Eltwise 0.056040 0.013115 0.003028
_plus1 Eltwise 0.082320 0.019265 0.003028
_plus10 Eltwise 0.015700 0.003674 0.000757
_plus11 Eltwise 0.017020 0.003983 0.000757
_plus12 Eltwise 0.018630 0.004360 0.000757
_plus13 Eltwise 0.017540 0.004105 0.000757
_plus14 Eltwise 0.017180 0.004021 0.000757
_plus15 Eltwise 0.016780 0.003927 0.000757
_plus16 Eltwise 0.017470 0.004088 0.000757
_plus17 Eltwise 0.015890 0.003719 0.000757
_plus18 Eltwise 0.019140 0.004479 0.000757
_plus19 Eltwise 0.019640 0.004596 0.000757
_plus2 Eltwise 0.082570 0.019324 0.003028
_plus20 Eltwise 0.018930 0.004430 0.000757
_plus21 Eltwise 0.007550 0.001767 0.000379
_plus22 Eltwise 0.011440 0.002677 0.000379
_plus23 Eltwise 0.012070 0.002825 0.000379
_plus3 Eltwise 0.018380 0.004301 0.001514
_plus4 Eltwise 0.023430 0.005483 0.001514
_plus5 Eltwise 0.020900 0.004891 0.001514
_plus6 Eltwise 0.020910 0.004894 0.001514
_plus7 Eltwise 0.009500 0.002223 0.000757
_plus8 Eltwise 0.015900 0.003721 0.000757
_plus9 Eltwise 0.016790 0.003929 0.000757
bn1 Scale 0.007340 0.001718 0.000379
bn1_scale Scale 0.005070 0.001187 0.000379
conv0 Convolution 0.635890 0.148816 0.342915
fc1 Scale 0.003520 0.000824 0.000008
fc1_scale Scale 0.002960 0.000693 0.000008
pre_fc1 Reshape 0.004020 0.000941 0.000008
relu0 PReLU 0.279560 0.065425 0.012112
stage1_unit1_bn1 Scale 0.231670 0.054217 0.012112
stage1_unit1_bn1_scale Scale 1.110530 0.259896 0.012112
stage1_unit1_conv1 Convolution 9.498240 2.222860 7.315514
stage1_unit1_conv1sc Convolution 4.232649 0.990561 0.203209
stage1_unit1_conv2 Convolution 27.030899 6.326005 1.828878
stage1_unit1_relu1 PReLU 0.318650 0.074573 0.012112
stage1_unit1avg_pool Pooling 0.255490 0.059792 0.012112
stage1_unit2_bn1 Scale 0.026020 0.006089 0.003028
stage1_unit2_bn1_scale Scale 0.020190 0.004725 0.003028
stage1_unit2_conv1 Convolution 10.946500 2.561795 1.828878
stage1_unit2_conv2 Convolution 11.117911 2.601910 1.828878
stage1_unit2_relu1 PReLU 0.050100 0.011725 0.003028
stage1_unit3_bn1 Scale 0.025020 0.005855 0.003028
stage1_unit3_bn1_scale Scale 0.017800 0.004166 0.003028
stage1_unit3_conv1 Convolution 8.413511 1.969003 1.828878
stage1_unit3_conv2 Convolution 8.860010 2.073496 1.828878
stage1_unit3_relu1 PReLU 0.042420 0.009927 0.003028
stage2_unit1_bn1 Scale 0.045510 0.010651 0.003028
stage2_unit1_bn1_scale Scale 0.040840 0.009558 0.003028
stage2_unit1_conv1 Convolution 16.480089 3.856813 3.657757
stage2_unit1_conv1sc Convolution 0.102860 0.024072 0.101604
stage2_unit1_conv2 Convolution 11.462340 2.682516 1.828878
stage2_unit1_relu1 PReLU 0.155240 0.036331 0.006056
stage2_unit1avg_pool Pooling 0.063820 0.014936 0.003028
stage2_unit2_bn1 Scale 0.016150 0.003780 0.001514
stage2_unit2_bn1_scale Scale 0.013690 0.003204 0.001514
stage2_unit2_conv1 Convolution 8.492849 1.987570 1.828878
stage2_unit2_conv2 Convolution 3.824409 0.895021 1.828878
stage2_unit2_relu1 PReLU 0.020470 0.004791 0.001514
stage2_unit3_bn1 Scale 0.014450 0.003382 0.001514
stage2_unit3_bn1_scale Scale 0.009870 0.002310 0.001514
stage2_unit3_conv1 Convolution 9.194940 2.151880 1.828878
stage2_unit3_conv2 Convolution 0.841480 0.196930 1.828878
stage2_unit3_relu1 PReLU 0.023480 0.005495 0.001514
stage2_unit4_bn1 Scale 0.015220 0.003562 0.001514
stage2_unit4_bn1_scale Scale 0.011610 0.002717 0.001514
stage2_unit4_conv1 Convolution 9.489868 2.220901 1.828878
stage2_unit4_conv2 Convolution 0.890230 0.208339 1.828878
stage2_unit4_relu1 PReLU 0.019920 0.004662 0.001514
stage3_unit10_bn1 Scale 0.011470 0.002684 0.000757
stage3_unit10_bn1_scale Scale 0.006040 0.001414 0.000757
stage3_unit10_conv1 Convolution 13.216301 3.092993 1.828878
stage3_unit10_conv2 Convolution 1.875410 0.438900 1.828878
stage3_unit10_relu1 PReLU 0.015050 0.003522 0.000757
stage3_unit11_bn1 Scale 0.011700 0.002738 0.000757
stage3_unit11_bn1_scale Scale 0.006150 0.001439 0.000757
stage3_unit11_conv1 Convolution 15.221920 3.562366 1.828878
stage3_unit11_conv2 Convolution 1.651460 0.386489 1.828878
stage3_unit11_relu1 PReLU 0.016780 0.003927 0.000757
stage3_unit12_bn1 Scale 0.011470 0.002684 0.000757
stage3_unit12_bn1_scale Scale 0.006020 0.001409 0.000757
stage3_unit12_conv1 Convolution 13.192543 3.087433 1.828878
stage3_unit12_conv2 Convolution 3.631160 0.849796 1.828878
stage3_unit12_relu1 PReLU 0.015240 0.003567 0.000757
stage3_unit13_bn1 Scale 0.011530 0.002698 0.000757
stage3_unit13_bn1_scale Scale 0.006230 0.001458 0.000757
stage3_unit13_conv1 Convolution 11.399889 2.667901 1.828878
stage3_unit13_conv2 Convolution 4.761270 1.114274 1.828878
stage3_unit13_relu1 PReLU 0.015570 0.003644 0.000757
stage3_unit14_bn1 Scale 0.011730 0.002745 0.000757
stage3_unit14_bn1_scale Scale 0.006340 0.001484 0.000757
stage3_unit14_conv1 Convolution 17.420458 4.076887 1.828878
stage3_unit14_conv2 Convolution 3.293881 0.770863 1.828878
stage3_unit14_relu1 PReLU 0.016050 0.003756 0.000757
stage3_unit1_bn1 Scale 0.015520 0.003632 0.001514
stage3_unit1_bn1_scale Scale 0.012270 0.002872 0.001514
stage3_unit1_conv1 Convolution 10.240309 2.396526 3.657757
stage3_unit1_conv1sc Convolution 0.098640 0.023085 0.101604
stage3_unit1_conv2 Convolution 4.060829 0.950350 1.828878
stage3_unit1_relu1 PReLU 0.061380 0.014365 0.003028
stage3_unit1avg_pool Pooling 0.032380 0.007578 0.001514
stage3_unit2_bn1 Scale 0.010470 0.002450 0.000757
stage3_unit2_bn1_scale Scale 0.008090 0.001893 0.000757
stage3_unit2_conv1 Convolution 12.054291 2.821049 1.828878
stage3_unit2_conv2 Convolution 1.147060 0.268445 1.828878
stage3_unit2_relu1 PReLU 0.015710 0.003677 0.000757
stage3_unit3_bn1 Scale 0.011100 0.002598 0.000757
stage3_unit3_bn1_scale Scale 0.008170 0.001912 0.000757
stage3_unit3_conv1 Convolution 10.811202 2.530131 1.828878
stage3_unit3_conv2 Convolution 2.299829 0.538226 1.828878
stage3_unit3_relu1 PReLU 0.015780 0.003693 0.000757
stage3_unit4_bn1 Scale 0.011050 0.002586 0.000757
stage3_unit4_bn1_scale Scale 0.005860 0.001371 0.000757
stage3_unit4_conv1 Convolution 13.093728 3.064308 1.828878
stage3_unit4_conv2 Convolution 1.101440 0.257768 1.828878
stage3_unit4_relu1 PReLU 0.016940 0.003964 0.000757
stage3_unit5_bn1 Scale 0.012010 0.002811 0.000757
stage3_unit5_bn1_scale Scale 0.006150 0.001439 0.000757
stage3_unit5_conv1 Convolution 12.269053 2.871310 1.828878
stage3_unit5_conv2 Convolution 2.237640 0.523672 1.828878
stage3_unit5_relu1 PReLU 0.015260 0.003571 0.000757
stage3_unit6_bn1 Scale 0.011640 0.002724 0.000757
stage3_unit6_bn1_scale Scale 0.005770 0.001350 0.000757
stage3_unit6_conv1 Convolution 16.940849 3.964644 1.828878
stage3_unit6_conv2 Convolution 1.132320 0.264995 1.828878
stage3_unit6_relu1 PReLU 0.017350 0.004060 0.000757
stage3_unit7_bn1 Scale 0.011600 0.002715 0.000757
stage3_unit7_bn1_scale Scale 0.007220 0.001690 0.000757
stage3_unit7_conv1 Convolution 11.837508 2.770316 1.828878
stage3_unit7_conv2 Convolution 1.119060 0.261892 1.828878
stage3_unit7_relu1 PReLU 0.014770 0.003457 0.000757
stage3_unit8_bn1 Scale 0.010970 0.002567 0.000757
stage3_unit8_bn1_scale Scale 0.006110 0.001430 0.000757
stage3_unit8_conv1 Convolution 15.528092 3.634018 1.828878
stage3_unit8_conv2 Convolution 2.429670 0.568612 1.828878
stage3_unit8_relu1 PReLU 0.015240 0.003567 0.000757
stage3_unit9_bn1 Scale 0.011650 0.002726 0.000757
stage3_unit9_bn1_scale Scale 0.006020 0.001409 0.000757
stage3_unit9_conv1 Convolution 16.640541 3.894363 1.828878
stage3_unit9_conv2 Convolution 1.810271 0.423655 1.828878
stage3_unit9_relu1 PReLU 0.017130 0.004009 0.000757
stage4_unit1_bn1 Scale 0.012760 0.002986 0.000757
stage4_unit1_bn1_scale Scale 0.008620 0.002017 0.000757
stage4_unit1_conv1 Convolution 12.342630 2.888529 3.657757
stage4_unit1_conv1sc Convolution 0.097090 0.022722 0.101604
stage4_unit1_conv2 Convolution 6.056579 1.417413 1.828878
stage4_unit1_relu1 PReLU 0.029830 0.006981 0.001514
stage4_unit1avg_pool Pooling 0.018720 0.004381 0.000757
stage4_unit2_bn1 Scale 0.008610 0.002015 0.000379
stage4_unit2_bn1_scale Scale 0.006960 0.001629 0.000379
stage4_unit2_conv1 Convolution 18.913898 4.426394 1.828878
stage4_unit2_conv2 Convolution 2.382850 0.557655 1.828878
stage4_unit2_relu1 PReLU 0.010200 0.002387 0.000379
stage4_unit3_bn1 Scale 0.009380 0.002195 0.000379
stage4_unit3_bn1_scale Scale 0.006250 0.001463 0.000379
stage4_unit3_conv1 Convolution 10.689858 2.501733 1.828878
stage4_unit3_conv2 Convolution 6.732821 1.575673 1.828878
stage4_unit3_relu1 PReLU 0.011260 0.002635 0.000379
Sort by time cost !
Node Type Avg(ms) % Called times Flops Rate
Reshape 0.072500 0.016967 2.000000 0.000386
Pooling 0.370410 0.086687 4.000000 0.017411
Eltwise 0.571718 0.133798 24.000000 0.026874
PReLU 1.229384 0.287711 25.000000 0.056019
Scale 1.930405 0.451770 52.000000 0.077988
Convolution 423.116730 99.021439 54.000000 99.814148
total time : 427.298096 ms, total mflops : 6321.113770
main, 112, cost time: 42751.425781 ms
@MNNTeam
什么backend?编译选项?
intel CPU,编译选项 在linux和Win10下都尝试了,也试过开启OpenMP
有谁能解释下么,不太明白为什么,还是说卷积层的参数数值分布也会对速度造成影响?
数值的大小是有可能对计算速度有影响的,加上/fp:fast
试试?
加了测试依旧老样子...可以提供模型给你们分析分析么
链接:https://pan.baidu.com/s/1rI__pZtz9m82d1lqfPfgKQ 提取码:ut8t
@MNNTeam
https://github.com/opencv/opencv/issues/17259 I have asked OpenCV for help, they have reproduced it, and they found something, maybe it helps
加了测试依旧老样子...可以提供模型给你们分析分析么
在Linux下测试一下,优化项改成-Ofast
或者加上-ffast-math
,Win下的那个选项不确定生不生效
跑起来好像并没有差别:
model path: ../model_faster.mnn
Total time:753.406ms
Total time:1358.21ms
Total time:633.002ms
Total time:582.318ms
Total time:685.973ms
Total time:710.825ms
Total time:569.775ms
Total time:568.638ms
Total time:578.189ms
Total time:572.155ms
Total time:564.811ms
Total time:564.979ms
Total time:565.746ms
Total time:587.9ms
Total time:567.468ms
Total time:566.8ms
Total time:568.834ms
Total time:566.636ms
Total time:684.769ms
Total time:840.325ms
model path: ../model_slower.mnn
Total time:691.004ms
Total time:512.862ms
Total time:632.205ms
Total time:886.202ms
Total time:516.058ms
Total time:516.966ms
Total time:546.255ms
Total time:693.711ms
Total time:522.364ms
Total time:532.506ms
Total time:530.202ms
Total time:524.836ms
Total time:527.669ms
Total time:526.518ms
Total time:807.811ms
Total time:677.18ms
Total time:522.285ms
Total time:517.519ms
Total time:517.875ms
Total time:537.054ms
linux上吗?
我目前是在windows上测出来的
linux上我测试动态库没差异,静态库比动态库慢
https://github.com/opencv/opencv/issues/17259 opencv已经找到了原因,是因为有denormal float,我按照Opencv的改动在转模型时做了类似的改动 对精度进行了测试,没有损失
opencv/opencv#17259 opencv已经找到了原因,是因为有denormal float,我按照Opencv的改动在转模型时做了类似的改动 对精度进行了测试,没有损失
-ffast-math
-Ofast
就是禁用denromal floats
看msvc等效的应该是/fp:fast
,但不知道为什么测下来没效果
一模一样得模型结构,只是不同时间训练的,其中一个耗时600ms,而另一个要2s左右。什么原因呢?