Closed Water2style closed 2 years ago
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
你好,是用的PaddleSlim. 步骤就是跟随教程来的. 推理的时候,只有batch_size=2的时候,30%的剪枝模型会快于原版,其他更大的batch_size都不如原版 主要步骤: FPGM+敏感性分析+跳过最后一个conv. (50%剪枝和30%剪枝唯一区别就是剪枝比例)
pruner = paddleslim.dygraph.FPGMFilterPruner(net, [1, 3, 224, 224]) pruner.sensitive(sen_file=args.sens_file) plan = pruner.sensitive_prune(args.prune_ratio, skip_vars=[ conv2d_params_name[-1]]).
您说的剪枝后的模型结构是 Summary信息吗?~
网络结构用的是 PaddleClas.MobileNetV1
这是剪枝算法print出来的最后的一部分,应该是展示了裁剪了哪些通道: 2022-02-19 03:59:47,031-INFO: change groups from 32 to 16 for conv2d_1.w_0. 2022-02-19 03:59:47,034-INFO: change groups from 64 to 46 for conv2d_3.w_0. 2022-02-19 03:59:47,037-INFO: change groups from 128 to 101 for conv2d_5.w_0. 2022-02-19 03:59:47,041-INFO: change groups from 128 to 99 for conv2d_7.w_0. 2022-02-19 03:59:47,045-INFO: change groups from 256 to 224 for conv2d_9.w_0. 2022-02-19 03:59:47,070-INFO: change groups from 256 to 174 for conv2d_11.w_0. 2022-02-19 03:59:47,076-INFO: change groups from 512 to 432 for conv2d_13.w_0. 2022-02-19 03:59:47,085-INFO: change groups from 512 to 417 for conv2d_15.w_0. 2022-02-19 03:59:47,093-INFO: change groups from 512 to 440 for conv2d_17.w_0. 2022-02-19 03:59:47,103-INFO: change groups from 512 to 438 for conv2d_23.w_0. 2022-02-19 03:59:47,116-INFO: change groups from 1024 to 693 for conv2d_25.w_0. FLOPs after pruning: 8253581.0 Pruned FLOPs: 30.01%
Layer (type) Input Shape Output Shape Param #
Conv2D-1 [[1, 3, 224, 224]] [1, 16, 112, 112] 432
BatchNorm-1 [[1, 16, 112, 112]] [1, 16, 112, 112] 64
ReLU-5 [[1, 16, 112, 112]] [1, 16, 112, 112] 0
ConvBNLayer-1 [[1, 3, 224, 224]] [1, 16, 112, 112] 0
Conv2D-2 [[1, 16, 112, 112]] [1, 16, 112, 112] 144
BatchNorm-2 [[1, 16, 112, 112]] [1, 16, 112, 112] 64
ReLU-6 [[1, 16, 112, 112]] [1, 16, 112, 112] 0
ConvBNLayer-2 [[1, 16, 112, 112]] [1, 16, 112, 112] 0
Conv2D-3 [[1, 16, 112, 112]] [1, 46, 112, 112] 736
BatchNorm-3 [[1, 46, 112, 112]] [1, 46, 112, 112] 184
ReLU-7 [[1, 46, 112, 112]] [1, 46, 112, 112] 0
ConvBNLayer-3 [[1, 16, 112, 112]] [1, 46, 112, 112] 0
DepthwiseSeparable-1 [[1, 16, 112, 112]] [1, 46, 112, 112] 0
Conv2D-4 [[1, 46, 112, 112]] [1, 46, 56, 56] 414
BatchNorm-4 [[1, 46, 56, 56]] [1, 46, 56, 56] 184
ReLU-8 [[1, 46, 56, 56]] [1, 46, 56, 56] 0
ConvBNLayer-4 [[1, 46, 112, 112]] [1, 46, 56, 56] 0
Conv2D-5 [[1, 46, 56, 56]] [1, 101, 56, 56] 4,646
BatchNorm-5 [[1, 101, 56, 56]] [1, 101, 56, 56] 404
ReLU-9 [[1, 101, 56, 56]] [1, 101, 56, 56] 0
ConvBNLayer-5 [[1, 46, 56, 56]] [1, 101, 56, 56] 0
DepthwiseSeparable-2 [[1, 46, 112, 112]] [1, 101, 56, 56] 0
Conv2D-6 [[1, 101, 56, 56]] [1, 101, 56, 56] 909
BatchNorm-6 [[1, 101, 56, 56]] [1, 101, 56, 56] 404
ReLU-10 [[1, 101, 56, 56]] [1, 101, 56, 56] 0
ConvBNLayer-6 [[1, 101, 56, 56]] [1, 101, 56, 56] 0
Conv2D-7 [[1, 101, 56, 56]] [1, 99, 56, 56] 9,999
BatchNorm-7 [[1, 99, 56, 56]] [1, 99, 56, 56] 396
ReLU-11 [[1, 99, 56, 56]] [1, 99, 56, 56] 0
ConvBNLayer-7 [[1, 101, 56, 56]] [1, 99, 56, 56] 0
DepthwiseSeparable-3 [[1, 101, 56, 56]] [1, 99, 56, 56] 0
Conv2D-8 [[1, 99, 56, 56]] [1, 99, 28, 28] 891
BatchNorm-8 [[1, 99, 28, 28]] [1, 99, 28, 28] 396
ReLU-12 [[1, 99, 28, 28]] [1, 99, 28, 28] 0
ConvBNLayer-8 [[1, 99, 56, 56]] [1, 99, 28, 28] 0
Conv2D-9 [[1, 99, 28, 28]] [1, 224, 28, 28] 22,176
BatchNorm-9 [[1, 224, 28, 28]] [1, 224, 28, 28] 896
ReLU-13 [[1, 224, 28, 28]] [1, 224, 28, 28] 0
ConvBNLayer-9 [[1, 99, 28, 28]] [1, 224, 28, 28] 0
DepthwiseSeparable-4 [[1, 99, 56, 56]] [1, 224, 28, 28] 0
Conv2D-10 [[1, 224, 28, 28]] [1, 224, 28, 28] 2,016
BatchNorm-10 [[1, 224, 28, 28]] [1, 224, 28, 28] 896
ReLU-14 [[1, 224, 28, 28]] [1, 224, 28, 28] 0
ConvBNLayer-10 [[1, 224, 28, 28]] [1, 224, 28, 28] 0
Conv2D-11 [[1, 224, 28, 28]] [1, 174, 28, 28] 38,976
BatchNorm-11 [[1, 174, 28, 28]] [1, 174, 28, 28] 696
ReLU-15 [[1, 174, 28, 28]] [1, 174, 28, 28] 0
ConvBNLayer-11 [[1, 224, 28, 28]] [1, 174, 28, 28] 0
DepthwiseSeparable-5 [[1, 224, 28, 28]] [1, 174, 28, 28] 0
Conv2D-12 [[1, 174, 28, 28]] [1, 174, 14, 14] 1,566
BatchNorm-12 [[1, 174, 14, 14]] [1, 174, 14, 14] 696
ReLU-16 [[1, 174, 14, 14]] [1, 174, 14, 14] 0
ConvBNLayer-12 [[1, 174, 28, 28]] [1, 174, 14, 14] 0
Conv2D-13 [[1, 174, 14, 14]] [1, 432, 14, 14] 75,168
BatchNorm-13 [[1, 432, 14, 14]] [1, 432, 14, 14] 1,728
ReLU-17 [[1, 432, 14, 14]] [1, 432, 14, 14] 0
ConvBNLayer-13 [[1, 174, 14, 14]] [1, 432, 14, 14] 0
DepthwiseSeparable-6 [[1, 174, 28, 28]] [1, 432, 14, 14] 0
Conv2D-14 [[1, 432, 14, 14]] [1, 432, 14, 14] 3,888
BatchNorm-14 [[1, 432, 14, 14]] [1, 432, 14, 14] 1,728
ReLU-18 [[1, 432, 14, 14]] [1, 432, 14, 14] 0
ConvBNLayer-14 [[1, 432, 14, 14]] [1, 432, 14, 14] 0
Conv2D-15 [[1, 432, 14, 14]] [1, 417, 14, 14] 180,144
BatchNorm-15 [[1, 417, 14, 14]] [1, 417, 14, 14] 1,668
ReLU-19 [[1, 417, 14, 14]] [1, 417, 14, 14] 0
ConvBNLayer-15 [[1, 432, 14, 14]] [1, 417, 14, 14] 0
DepthwiseSeparable-7 [[1, 432, 14, 14]] [1, 417, 14, 14] 0
Conv2D-16 [[1, 417, 14, 14]] [1, 417, 14, 14] 3,753
BatchNorm-16 [[1, 417, 14, 14]] [1, 417, 14, 14] 1,668
ReLU-20 [[1, 417, 14, 14]] [1, 417, 14, 14] 0
ConvBNLayer-16 [[1, 417, 14, 14]] [1, 417, 14, 14] 0
Conv2D-17 [[1, 417, 14, 14]] [1, 440, 14, 14] 183,480
BatchNorm-17 [[1, 440, 14, 14]] [1, 440, 14, 14] 1,760
ReLU-21 [[1, 440, 14, 14]] [1, 440, 14, 14] 0
ConvBNLayer-17 [[1, 417, 14, 14]] [1, 440, 14, 14] 0
DepthwiseSeparable-8 [[1, 417, 14, 14]] [1, 440, 14, 14] 0
Conv2D-18 [[1, 440, 14, 14]] [1, 440, 14, 14] 3,960
BatchNorm-18 [[1, 440, 14, 14]] [1, 440, 14, 14] 1,760
ReLU-22 [[1, 440, 14, 14]] [1, 440, 14, 14] 0
ConvBNLayer-18 [[1, 440, 14, 14]] [1, 440, 14, 14] 0
Conv2D-19 [[1, 440, 14, 14]] [1, 512, 14, 14] 225,280
BatchNorm-19 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048
ReLU-23 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
ConvBNLayer-19 [[1, 440, 14, 14]] [1, 512, 14, 14] 0
DepthwiseSeparable-9 [[1, 440, 14, 14]] [1, 512, 14, 14] 0
Conv2D-20 [[1, 512, 14, 14]] [1, 512, 14, 14] 4,608
BatchNorm-20 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048
ReLU-24 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
ConvBNLayer-20 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
Conv2D-21 [[1, 512, 14, 14]] [1, 512, 14, 14] 262,144
BatchNorm-21 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048
ReLU-25 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
ConvBNLayer-21 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
DepthwiseSeparable-10 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
Conv2D-22 [[1, 512, 14, 14]] [1, 512, 14, 14] 4,608
BatchNorm-22 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048
ReLU-26 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
ConvBNLayer-22 [[1, 512, 14, 14]] [1, 512, 14, 14] 0
Conv2D-23 [[1, 512, 14, 14]] [1, 438, 14, 14] 224,256
BatchNorm-23 [[1, 438, 14, 14]] [1, 438, 14, 14] 1,752
ReLU-27 [[1, 438, 14, 14]] [1, 438, 14, 14] 0
ConvBNLayer-23 [[1, 512, 14, 14]] [1, 438, 14, 14] 0
DepthwiseSeparable-11 [[1, 512, 14, 14]] [1, 438, 14, 14] 0
Conv2D-24 [[1, 438, 14, 14]] [1, 438, 7, 7] 3,942
BatchNorm-24 [[1, 438, 7, 7]] [1, 438, 7, 7] 1,752
ReLU-28 [[1, 438, 7, 7]] [1, 438, 7, 7] 0
ConvBNLayer-24 [[1, 438, 14, 14]] [1, 438, 7, 7] 0
Conv2D-25 [[1, 438, 7, 7]] [1, 693, 7, 7] 303,534
BatchNorm-25 [[1, 693, 7, 7]] [1, 693, 7, 7] 2,772
ReLU-29 [[1, 693, 7, 7]] [1, 693, 7, 7] 0
ConvBNLayer-25 [[1, 438, 7, 7]] [1, 693, 7, 7] 0
DepthwiseSeparable-12 [[1, 438, 14, 14]] [1, 693, 7, 7] 0
Conv2D-26 [[1, 693, 7, 7]] [1, 693, 7, 7] 6,237
BatchNorm-26 [[1, 693, 7, 7]] [1, 693, 7, 7] 2,772
ReLU-30 [[1, 693, 7, 7]] [1, 693, 7, 7] 0
ConvBNLayer-26 [[1, 693, 7, 7]] [1, 693, 7, 7] 0
Conv2D-27 [[1, 693, 7, 7]] [1, 1024, 7, 7] 709,632
BatchNorm-27 [[1, 1024, 7, 7]] [1, 1024, 7, 7] 4,096
ReLU-31 [[1, 1024, 7, 7]] [1, 1024, 7, 7] 0
ConvBNLayer-27 [[1, 693, 7, 7]] [1, 1024, 7, 7] 0
DepthwiseSeparable-13 [[1, 693, 7, 7]] [1, 1024, 7, 7] 0
AdaptiveAvgPool2D-1 [[1, 1024, 7, 7]] [1, 1024, 1, 1] 0
Flatten-1 [[1, 1024, 1, 1]] [1, 1024] 0
Linear-1 [[1, 1024]] [1, 1000] 1,025,000
Total params: 3,339,467
Trainable params: 3,302,539
Non-trainable params: 36,928
Input size (MB): 0.57
Forward/backward pass size (MB): 132.26
Params size (MB): 12.74
Estimated Total Size (MB): 145.57
只有50%剪枝后的模型推理速度比原版强. 30%的剪枝模型还比原版弱了. 这是为什么呢?谢谢
这是因为PaddleInference所调用的Nvidia GPU计算库cublas(或cublaslt)对矩阵乘或卷积计算的操作做了深入的优化。
, B=[1024, 256]
做了特殊优化O,将A剪裁为[700, 1024], 有可能无法命中优化O, 性能反而不如剪裁之前。
选项,使剪裁后的通道数是8或16的倍数;以上,在Nvidia GPU上剪裁的推理加速确实没有Intel CPU和ARM CPU来的容易,需要多做一些工作。
只有50%剪枝后的模型推理速度比原版强. 30%的剪枝模型还比原版弱了. 这是为什么呢?谢谢
这是因为PaddleInference所调用的Nvidia GPU计算库cublas(或cublaslt)对矩阵乘或卷积计算的操作做了深入的优化。 比如,如果对
,B=[1024, 256]
做了特殊优化O,将A剪裁为[700, 1024], 有可能无法命中优化O, 性能反而不如剪裁之前。可以分别尝试以下两个方法,来尽量避免上述问题:
- 在敏感度分析时,开启sensitive_prune的
选项,使剪裁后的通道数是8或16的倍数;- 在当前得到的一组剪裁率的基础上,微调剪裁率,使剪裁后的通道数是8或16的倍数。
以上,在Nvidia GPU上剪裁的推理加速确实没有Intel CPU和ARM CPU来的容易,需要多做一些工作。
好的谢谢! 我后面继续做实验试试
使用FPGM对MobileNetV1分别剪枝 30%和50%.
使用paddle-inference推理预测1万张ImageNet2012图片. 计时位置是的前后.
只有50%剪枝后的模型推理速度比原版强. 30%的剪枝模型还比原版弱了. 这是为什么呢?谢谢