Open ccssu opened 1 year ago
graph TB
A[Input] -->B["block1.0: Conv2d + BN"]
B --> D["_ElementWiseOp_1(GeluBackward0)"]
D --> E["block1.3: Conv2d "]
E --> F["block1.4: BatchNorm2d"]
F --> G["_SplitOp_0([0, 10, 30])"]
G --> H["block2_1: Conv2d + BN "]
G --> I["block2_2: Conv2d + BN"]
H --> J[Output1]
I --> K[Output2]
graph TB
A[Input] -->B["block1.0: Conv2d + BN"]
B --> D["_ElementWiseOp_3(geluBackward)"]
D --> E["_ElementWiseOp_2(conv2dBackward)"]
E --> F["block1.3: Conv2d"]
F --> G["block1.4: BatchNorm2d"]
G --> H["_SplitOp_1([0, 10])"]
G --> I["_SplitOp_6([0, 20])"]
H --> K["_ElementWiseOp_0(conv2dBackward)"]
K --> L["block2_1: Conv2d + BN"]
I --> M["_ElementWiseOp_5(conv2dBackward)"]
M --> N["block2_2: Conv2d + BN"]
L --> O[Output1]
N --> P[Output2]
torch.split_with_sizes 和 torch.split 的输出类型不同。torch.split_with_sizes 输出 SplitWithSizesBackward,torch.split 输出 SplitBackward。具体如下表
out = torch.split_with_sizes(x, [2, 3, 1]) | out = torch.split(x, 2) | |
---|---|---|
oneflow输出 | <view::narrow_backward at 0x55e999570db0> | <view::narrow_backward at 0x55e99b35de60> |
pytorch输出 | <SplitWithSizesBackward0 object at 0x7f8242c3cd90> | <SplitBackward0 object at 0x7f8242c3cd90> |
demo 代码
import torch
x = torch.empty(6, requires_grad=True)
out = torch.split(x, [2, 3, 1])
out[0].sum().backward()
print(out[0].grad_fn)
out = torch.split(x, 2)
print(out[0].grad_fn)
上图来原文: https://zhuanlan.zhihu.com/p/619146631?utm_id=0 linear的输入和输出是可以独立剪枝的原因 ?
逻辑上将每一层 $f_i$ 拆解成输入 $f^{-}_i$ 和输出 $f^{+}_i$ 这两种剪枝方案分别作用于矩阵W的行和列,相互独立,所以全连接层的输入 $f^-(输入矩阵到W的乘积)$ 和输出 $f^+(W到输出向量的乘积)$ 在依赖图中也是相互独立的,非耦合的。 换句话说,剪枝W的行仅影响从输入到W的转化,不会影响W到输出的转化,反之亦然。所以,全连接层的输入和输出可以独立地进行剪枝。
图3. 通过递归传播在DepGraph上实现层分组,从 $f_4^+$ 开始。在这个例子中,由于上述分歧的剪枝方案,卷积输入 $f_4^-$ 和输出 $f_4^+$ 之间不存在层内依赖关系。
逻辑上将每一层 $f_i$ 拆解成输入 $f^{-}_i$ 和输出 $f^{+}_i$ 。基于这一描述,一个简单的堆叠网络就可以描述为:
其中符号 $$\leftrightarrow$$ 表示网络连接。
linear的输入和输出是可以独立剪枝的原因 : 如图3所示,卷积层的输入和输出具有不同的剪枝方案,即 w[:, k, :, :] = w[k, :, :, :], $$sch(f^-_i ) != sch(f^+_i )$$ 。在这种情况下,卷积层的输入和输出之间不存在依赖关系。
这里是对这个例子的中文解释:
# 对矩阵W的第k行进行剪枝
w[k, :]
# 对矩阵W的第k列进行剪枝
w[:, k]
# 例子: 来源 Claude
# 输入: [x1, x2, x3] (3个输入)
# W:
# w11 w12 w13
# w21 w22 w23 (2行3列)
# 输出: [y1, y2] (2个输出)
# 如果我们剪枝W的第一行
# 输入: [x1, x2, x3] (3个输入)
# W:
# w21 w22 w23 (1行3列)
# 输出: [y1, y2] (2个输出)
# 那么输入到W的转换会受影响,x1变为0。但是W到输出的计算不受影响,y1和y2的值保持不变。
# 如果我们剪枝W的第三列
# 输入: [x1, x2, x3] (3个输入)
# W:
# w11 w12
# w21 w22 (2行2列)
# 输出: [y1, y2] (2个输出)
# 然后W到输出的转换会受影响,y2变为0。但是输入到W的计算不受影响,x1,x2和x3的值保持不变。
这个例子说明:
- 输入矩阵[x1, x2, x3]与权重矩阵W相乘,得到输入到隐层的映射
- W与输出向量[1, 1]相乘,得到隐层到输出的映射
- 通过剪枝W的第一行,输入到W的映射发生变化,x1变为0,但W到输出的映射不变
- 通过剪枝W的第三列,W到输出的映射发生变化,y2变为0,但输入到W的映射不变
- 这说明全连接层的输入和输出是可以独立剪枝的
目录
问题记录
解决进度
END目录
脚本使用 模型
class Net(nn.Module):
```py class Net(nn.Module): def __init__(self, in_dim): super().__init__() self.block1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim), nn.GELU(), nn.Conv2d(in_dim, in_dim*3, 1), nn.BatchNorm2d(in_dim*3) ) self.block2_1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) self.block2_2 = nn.Sequential( nn.Conv2d(2*in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) def forward(self, x): x = self.block1(x) num_ch = x.shape[1] c1, c2 = self.block2_1[0].in_channels, self.block2_2[0].in_channels x1, x3 = torch.split(x, [c1, c2], dim=1) x1 = self.block2_1(x1) #x2 = self.block2_1(x2) x3 = self.block2_2(x3) return x1, x3 ```报错信息
环境信息
Linux oneflow27-root 错误脚本 commit号 : 0ce83f1 (HEAD -> master) HEAD@{0}: reset: moving to 0ce83f1 https://docs.oneflow.org/master/cookies/oneflow_torch.html
启动指令
eval $(python3 -m oneflow.mock_torch) && python test_split.py
file 路径 OneFlow-Pruning/tests/test_split.py
```py import sys, os sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))) import torch import torch_pruning as tp import torch.nn as nn class Net(nn.Module): def __init__(self, in_dim): super().__init__() self.block1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim), nn.GELU(), nn.Conv2d(in_dim, in_dim*3, 1), nn.BatchNorm2d(in_dim*3) ) self.block2_1 = nn.Sequential( nn.Conv2d(in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) self.block2_2 = nn.Sequential( nn.Conv2d(2*in_dim, in_dim, 1), nn.BatchNorm2d(in_dim) ) def forward(self, x): x = self.block1(x) num_ch = x.shape[1] c1, c2 = self.block2_1[0].in_channels, self.block2_2[0].in_channels x1, x3 = torch.split(x, [c1, c2], dim=1) x1 = self.block2_1(x1) #x2 = self.block2_1(x2) x3 = self.block2_2(x3) return x1, x3 def test_pruner(): model = Net(10) print(model) # Global metrics example_inputs = torch.randn(1, 10, 7, 7) imp = tp.importance.RandomImportance() ignored_layers = [] # DO NOT prune the final classifier! for m in model.modules(): if isinstance(m, torch.nn.Linear) and m.out_features == 1000: ignored_layers.append(m) iterative_steps = 1 pruner = tp.pruner.MagnitudePruner( model, example_inputs, importance=imp, iterative_steps=iterative_steps, ch_sparsity=0.5, # remove 50% channels, ResNet18 = {64, 128, 256, 512} => ResNet18_Half = {32, 64, 128, 256} ignored_layers=ignored_layers, ) for g in pruner.DG.get_all_groups(): pass base_macs, base_nparams = tp.utils.count_ops_and_params(model, example_inputs) for i in range(iterative_steps): for g in pruner.step(interactive=True): print(g.details()) g.prune() print(model) macs, nparams = tp.utils.count_ops_and_params(model, example_inputs) print([o.shape for o in model(example_inputs)]) print( " Iter %d/%d, Params: %.2f => %.2f" % (i+1, iterative_steps, base_nparams, nparams) ) print( " Iter %d/%d, MACs: %.2f => %.2f" % (i+1, iterative_steps, base_macs, macs ) ) # finetune your model here # finetune(model) # ... if __name__=='__main__': test_pruner() ```通这个错误 发现 要保证这个mock torch兼容 https://github.com/VainF/Torch-Pruning 达到 0 改动。