Oneflow-Inc / OneFlow-Pruning

[CVPR-2023] Towards Any Structural Pruning
https://arxiv.org/abs/2301.12900
MIT License
16 stars 1 forks source link

记录: test_split.py 脚本 运行时错误(Runtime Error) #7

Open ccssu opened 1 year ago

ccssu commented 1 year ago

目录

问题记录

  1. test_split.py 脚本 运行时错误(Runtime Error)
  2. 错误的Pruning Group https://github.com/Oneflow-Inc/OneFlow-Pruning/issues/7#issuecomment-1525190389

解决进度

END目录

脚本使用 模型

graph TB
A[Input] -->B["block1: Conv2d + BN + GELU + Conv2d + BN"]
B --> C["torch.split"]
C --> D["block2_1: Conv2d + BN"] 
C --> E["block2_2: Conv2d + BN"]
D --> F[Output1]
E --> G[Output2]
class Net(nn.Module): ```py class Net(nn.Module): def __init__(self, in_dim): super().__init__() self.block1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim), nn.GELU(), nn.Conv2d(in_dim, in_dim*3, 1), nn.BatchNorm2d(in_dim*3) ) self.block2_1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) self.block2_2 = nn.Sequential( nn.Conv2d(2*in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) def forward(self, x): x = self.block1(x) num_ch = x.shape[1] c1, c2 = self.block2_1[0].in_channels, self.block2_2[0].in_channels x1, x3 = torch.split(x, [c1, c2], dim=1) x1 = self.block2_1(x1) #x2 = self.block2_1(x2) x3 = self.block2_2(x3) return x1, x3 ```

报错信息

从下面报错信息 仅仅是 split 触发IndexError ,通过打印 Pruning Group 可以发现不一样的地方 conv2dBackward , _SplitOp_6 , 两个Pruning Group 的流程图 https://github.com/Oneflow-Inc/OneFlow-Pruning/issues/7#issuecomment-1525190389 具体触发点原因暂时未知, 而且有一点,报错 具有随机性,偶尔可能正确执行。

Traceback (most recent call last):
  File "test_split.py", line 88, in <module>
    test_pruner()
  File "test_split.py", line 72, in test_pruner
    macs, nparams = tp.utils.count_ops_and_params(model, example_inputs)
  File "/data/home/fengwen/package/oneflow/python/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/data/home/fengwen/package/oneflow/.idea/OneFlow-Pruning/torch_pruning/utils/op_counter.py", line 26, in count_ops_and_params
    _ = flops_model(example_inputs)
  File "/data/home/fengwen/package/oneflow/python/oneflow/nn/modules/module.py", line 224, in __call__
    res = self.forward(*args, **kwargs)
  File "test_split.py", line 36, in forward
    x1, x3 = torch.split(x, [c1, c2], dim=1)
RuntimeError: Error: split_with_sizes expects split_sizes to sum exactly to 15 (input tensor's size at dimension 1), but got sum(split_sizes)=13

环境信息

Linux oneflow27-root 错误脚本 commit号 : 0ce83f1 (HEAD -> master) HEAD@{0}: reset: moving to 0ce83f1 https://docs.oneflow.org/master/cookies/oneflow_torch.html

启动指令 eval $(python3 -m oneflow.mock_torch) && python test_split.py

file 路径 OneFlow-Pruning/tests/test_split.py ```py import sys, os sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))) import torch import torch_pruning as tp import torch.nn as nn class Net(nn.Module): def __init__(self, in_dim): super().__init__() self.block1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim), nn.GELU(), nn.Conv2d(in_dim, in_dim*3, 1), nn.BatchNorm2d(in_dim*3) ) self.block2_1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) self.block2_2 = nn.Sequential( nn.Conv2d(2*in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) def forward(self, x): x = self.block1(x) num_ch = x.shape[1] c1, c2 = self.block2_1[0].in_channels, self.block2_2[0].in_channels x1, x3 = torch.split(x, [c1, c2], dim=1) x1 = self.block2_1(x1) #x2 = self.block2_1(x2) x3 = self.block2_2(x3) return x1, x3 def test_pruner(): model = Net(10) print(model) # Global metrics example_inputs = torch.randn(1, 10, 7, 7) imp = tp.importance.RandomImportance() ignored_layers = [] # DO NOT prune the final classifier! for m in model.modules(): if isinstance(m, torch.nn.Linear) and m.out_features == 1000: ignored_layers.append(m) iterative_steps = 1 pruner = tp.pruner.MagnitudePruner( model, example_inputs, importance=imp, iterative_steps=iterative_steps, ch_sparsity=0.5, # remove 50% channels, ResNet18 = {64, 128, 256, 512} => ResNet18_Half = {32, 64, 128, 256} ignored_layers=ignored_layers, ) for g in pruner.DG.get_all_groups(): pass base_macs, base_nparams = tp.utils.count_ops_and_params(model, example_inputs) for i in range(iterative_steps): for g in pruner.step(interactive=True): print(g.details()) g.prune() print(model) macs, nparams = tp.utils.count_ops_and_params(model, example_inputs) print([o.shape for o in model(example_inputs)]) print( " Iter %d/%d, Params: %.2f => %.2f" % (i+1, iterative_steps, base_nparams, nparams) ) print( " Iter %d/%d, MACs: %.2f => %.2f" % (i+1, iterative_steps, base_macs, macs ) ) # finetune your model here # finetune(model) # ... if __name__=='__main__': test_pruner() ```

通这个错误 发现 要保证这个mock torch兼容 https://github.com/VainF/Torch-Pruning 达到 0 改动。

ccssu commented 1 year ago

Pruning Group

pytorch

pt_all_groups ```sh -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [1] prune_out_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_1.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=10 -------------------------------- -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))), #idxs=30 [1] prune_out_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.4 (BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=30 [2] prune_out_channels on block1.4 (BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _SplitOp_0([0, 10, 30]), #idxs=30 [3] prune_out_channels on _SplitOp_0([0, 10, 30]) => prune_in_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [4] prune_out_channels on _SplitOp_0([0, 10, 30]) => prune_in_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=20 -------------------------------- -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [1] prune_out_channels on block1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=10 [2] prune_out_channels on block1.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _ElementWiseOp_1(GeluBackward0), #idxs=10 [3] prune_out_channels on _ElementWiseOp_1(GeluBackward0) => prune_in_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 -------------------------------- -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [1] prune_out_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_2.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=10 -------------------------------- ```
graph TB
A[Input] -->B["block1.0: Conv2d + BN"] 
B --> D["_ElementWiseOp_1(GeluBackward0)"]
D --> E["block1.3: Conv2d "]    
E --> F["block1.4: BatchNorm2d"]
F --> G["_SplitOp_0([0, 10, 30])"]
G --> H["block2_1: Conv2d + BN "]
G --> I["block2_2: Conv2d + BN"]  
H --> J[Output1]
I --> K[Output2]

OneFlow

oneflow_all_groups ```sh -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [1] prune_out_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_1.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=10 -------------------------------- -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))), #idxs=30 [1] prune_out_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.4 (BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=30 [2] prune_out_channels on block1.4 (BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _SplitOp_1([0, 10]), #idxs=30 [3] prune_out_channels on block1.4 (BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _SplitOp_6([0, 20]), #idxs=30 [4] prune_out_channels on _SplitOp_6([0, 20]) => prune_out_channels on _ElementWiseOp_5(conv2dBackward), #idxs=20 [5] prune_out_channels on _ElementWiseOp_5(conv2dBackward) => prune_in_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=20 [6] prune_out_channels on _SplitOp_1([0, 10]) => prune_out_channels on _ElementWiseOp_0(conv2dBackward), #idxs=10 [7] prune_out_channels on _ElementWiseOp_0(conv2dBackward) => prune_in_channels on block2_1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 -------------------------------- -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [1] prune_out_channels on block1.0 (Conv2d(10, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block1.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=10 [2] prune_out_channels on block1.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _ElementWiseOp_3(geluBackward), #idxs=10 [3] prune_out_channels on _ElementWiseOp_3(geluBackward) => prune_out_channels on _ElementWiseOp_2(conv2dBackward), #idxs=10 [4] prune_out_channels on _ElementWiseOp_2(conv2dBackward) => prune_in_channels on block1.3 (Conv2d(10, 30, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 -------------------------------- -------------------------------- Pruning Group -------------------------------- [0] prune_out_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))), #idxs=10 [1] prune_out_channels on block2_2.0 (Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1))) => prune_out_channels on block2_2.1 (BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), #idxs=10 -------------------------------- ```
graph TB
A[Input] -->B["block1.0: Conv2d + BN"]
B --> D["_ElementWiseOp_3(geluBackward)"]  
D --> E["_ElementWiseOp_2(conv2dBackward)"]
E --> F["block1.3: Conv2d"]    
F --> G["block1.4: BatchNorm2d"]
G --> H["_SplitOp_1([0, 10])"]
G --> I["_SplitOp_6([0, 20])"]  
H --> K["_ElementWiseOp_0(conv2dBackward)"]
K --> L["block2_1: Conv2d + BN"] 
I --> M["_ElementWiseOp_5(conv2dBackward)"]  
M --> N["block2_2: Conv2d + BN"]
L --> O[Output1]  
N --> P[Output2]
ccssu commented 1 year ago

torch.split_with_sizes 和 torch.split 的输出类型不同。torch.split_with_sizes 输出 SplitWithSizesBackward,torch.split 输出 SplitBackward。具体如下表

out = torch.split_with_sizes(x, [2, 3, 1]) out = torch.split(x, 2)
oneflow输出 <view::narrow_backward at 0x55e999570db0> <view::narrow_backward at 0x55e99b35de60>
pytorch输出 <SplitWithSizesBackward0 object at 0x7f8242c3cd90> <SplitBackward0 object at 0x7f8242c3cd90>

demo 代码

import   torch

x = torch.empty(6, requires_grad=True)

out = torch.split(x, [2, 3, 1])

out[0].sum().backward()

print(out[0].grad_fn)

out = torch.split(x, 2)

print(out[0].grad_fn)
ccssu commented 1 year ago

image

上图来原文: https://zhuanlan.zhihu.com/p/619146631?utm_id=0 linear的输入和输出是可以独立剪枝的原因 ?

逻辑上将每一层 $f_i$ 拆解成输入 $f^{-}_i$ 和输出 $f^{+}_i$ 这两种剪枝方案分别作用于矩阵W的行和列,相互独立,所以全连接层的输入 $f^-(输入矩阵到W的乘积)$ 和输出 $f^+(W到输出向量的乘积)$ 在依赖图中也是相互独立的,非耦合的。 换句话说,剪枝W的行仅影响从输入到W的转化,不会影响W到输出的转化,反之亦然。所以,全连接层的输入和输出可以独立地进行剪枝。

image

图3. 通过递归传播在DepGraph上实现层分组,从 $f_4^+$ 开始。在这个例子中,由于上述分歧的剪枝方案,卷积输入 $f_4^-$ 和输出 $f_4^+$ 之间不存在层内依赖关系。

逻辑上将每一层 $f_i$ 拆解成输入 $f^{-}_i$ 和输出 $f^{+}_i$ 。基于这一描述,一个简单的堆叠网络就可以描述为: image

其中符号 $$\leftrightarrow$$ 表示网络连接。

linear的输入和输出是可以独立剪枝的原因 : 如图3所示,卷积层的输入和输出具有不同的剪枝方案,即 w[:, k, :, :] = w[k, :, :, :], $$sch(f^-_i ) != sch(f^+_i )$$ 。在这种情况下,卷积层的输入和输出之间不存在依赖关系。

这里是对这个例子的中文解释:
# 对矩阵W的第k行进行剪枝
w[k, :] 
# 对矩阵W的第k列进行剪枝
w[:, k]
# 例子: 来源 Claude

# 输入: [x1, x2, x3] (3个输入) 
# W: 
#    w11 w12 w13
#    w21 w22 w23 (2行3列)
# 输出: [y1, y2] (2个输出)
# 如果我们剪枝W的第一行
# 输入: [x1, x2, x3] (3个输入) 
# W:
#      w21 w22 w23 (1行3列)
# 输出: [y1, y2] (2个输出)
# 那么输入到W的转换会受影响,x1变为0。但是W到输出的计算不受影响,y1和y2的值保持不变。
# 如果我们剪枝W的第三列
# 输入: [x1, x2, x3] (3个输入) 
# W:  
#    w11 w12   
#    w21 w22 (2行2列) 
# 输出: [y1, y2] (2个输出)
# 然后W到输出的转换会受影响,y2变为0。但是输入到W的计算不受影响,x1,x2和x3的值保持不变。
这个例子说明:
- 输入矩阵[x1, x2, x3]与权重矩阵W相乘,得到输入到隐层的映射
- W与输出向量[1, 1]相乘,得到隐层到输出的映射
- 通过剪枝W的第一行,输入到W的映射发生变化,x1变为0,但W到输出的映射不变
- 通过剪枝W的第三列,W到输出的映射发生变化,y2变为0,但输入到W的映射不变   
- 这说明全连接层的输入和输出是可以独立剪枝的