Closed MaeThird closed 3 years ago
我们实测RepVGG转换前后的等价性是可以精确到小数点后十位的,这个跟FLOPs高低没有关系。看起来这是一个customize的模型(因为最后是372类)。可否像FAQs的第一条那样check一下?
x = torch.from_numpy(np.random.randn(1,*shape)).float()
y = model(x)
model_d = repvgg_model_convert(model,model_func,out_c=186*2,num_blocks=[4,6,16,1],in_c=1)
y_d = model_d(x)
print('diff abs: max {},\n**2:{}'.format(abs(y - y_d).max(),((y - y_d) ** 2).sum()))
输出: diff abs: max 6.67572021484375e-06, **2:1.419987460948846e-09 这里看正常的,但是实际训练下来,最后导出就是有之前贴出来的那么大差异。convert细节我没搞清,不好枉下结论。
x = torch.from_numpy(np.random.randn(1,*shape)).float() y = model(x) model_d = repvgg_model_convert(model,model_func,out_c=186*2,num_blocks=[4,6,16,1],in_c=1) y_d = model_d(x) print('diff abs: max {},\n**2:{}'.format(abs(y - y_d).max(),((y - y_d) ** 2).sum()))
输出: diff abs: max 6.67572021484375e-06, **2:1.419987460948846e-09 这里看正常的,但是实际训练下来,最后导出就是有之前贴出来的那么大差异。convert细节我没搞清,不好枉下结论。
我在实现RepVGG的时候观察到两个现象:
def init_weights(modules):
for m in modules:
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
会造成较大精度不对齐,而使用下述初始化则可以保证一致性
def _init_weights(self, gamma=0.01):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, gamma)
nn.init.constant_(m.bias, gamma)
下面是测试代码:
def test_regvgg():
model = RepVGGRecognizer()
model.eval()
print(model)
data = torch.randn(1, 3, 224, 224)
insert_repvgg_block(model)
model.eval()
train_outputs = model(data)[KEY_OUTPUT]
print(model)
fuse_repvgg_block(model)
model.eval()
eval_outputs = model(data)[KEY_OUTPUT]
print(model)
print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
希望能够对你有所帮助
@zjykzj 感谢🙏 我确定是eval了以后才convert的。convert似乎就是bn融合+1x1padding成3x3,我找时间再读读。
在任何情况下任何含有BN或dropout的模型在inference前都应该调用eval(),这一点跟RepVGG没有关系。@zjykzj请问可否给出一个造成较大精度不对齐的示例?我试过不管怎么初始化,weights的值域不管多大,实测都是完全一样的。
上面的测试代码直接跑的是用初始化的权重,这样的话是没问题。但是加载训练过的权重的话就不行了
在任何情况下任何含有BN或dropout的模型在inference前都应该调用eval(),这一点跟RepVGG没有关系。@zjykzj请问可否给出一个造成较大精度不对齐的示例?我试过不管怎么初始化,weights的值域不管多大,实测都是完全一样的。
我的实现和测试是在自定义仓库( ZJCV/ZCls )上完成的,目前关键代码均来自楼主的实现,所以整体实现是一样的,我会在下面贴出具体定义的模型,比较一下就了解了
可分为两个部分测试:
测试代码如下(参考test_repvgg_block.py):
# -*- coding: utf-8 -*-
"""
@date: 2021/2/1 下午8:01
@file: test_asymmetric_convolution_block.py
@author: zj
@description:
"""
import torch
import torch.nn as nn
from zcls.config.key_word import KEY_OUTPUT
from zcls.model.recognizers.repvgg_recognizer import RepVGGRecognizer
from zcls.model.layers.repvgg_block import RepVGGBlock
from zcls.model.conv_helper import insert_repvgg_block, fuse_repvgg_block
。。。
。。。
def test_conv_helper():
in_channels = 32
out_channels = 64
dilation = 1
# 下采样 + 分组卷积
kernel_size = 3
stride = 2
padding = 1
groups = 8
model = nn.Sequential(
nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size,
stride=stride, padding=padding, dilation=dilation, groups=groups),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
model.eval()
print(model)
data = torch.randn(1, in_channels, 56, 56)
insert_repvgg_block(model)
model.eval()
train_outputs = model(data)
print(model)
fuse_repvgg_block(model)
model.eval()
eval_outputs = model(data)
print(model)
print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
if __name__ == '__main__':
test_repvgg_block()
print('*' * 10)
使用如下方式进行初始化:
def init_weights(modules):
for m in modules:
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
测试结果如下:
############################################### 这个是初始定义的模型
Sequential(
(0): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=8)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
####################################################3 这个是插入RepVGGBlock后的模型
Sequential(
(0): Sequential(
(0): RepVGGBlock(
(rbr_dense): Sequential(
(conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=8, bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2), groups=8, bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Identity()
(2): ReLU(inplace=True)
)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
##################################################### 这个是融合RepVGGBlock后的模型
Sequential(
(0): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=8)
(1): Identity()
(2): ReLU(inplace=True)
)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
################################################## 计算torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2))的结果
tensor(1.1523e-05, grad_fn=<SqrtBackward>)
##################################################3333 使用torch.allclose(train_outputs, eval_outputs, atol=1e-8)比较两个输出是否一致
False
Traceback (most recent call last):
File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 149, in <module>
test_conv_helper()
File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 121, in test_conv_helper
assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
AssertionError
Process finished with exit code 1
使用如下方式进行初始化:
def _init_weights(self, gamma=0.01):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, gamma)
nn.init.constant_(m.bias, gamma)
测试结果如下:
。。。
。。。
tensor(2.3963e-07, grad_fn=<SqrtBackward>)
True
和上面的实现类似
def test_regvgg():
model = RepVGGRecognizer()
model.eval()
print(model)
data = torch.randn(1, 3, 224, 224)
insert_repvgg_block(model)
model.eval()
train_outputs = model(data)[KEY_OUTPUT]
print(model)
fuse_repvgg_block(model)
model.eval()
eval_outputs = model(data)[KEY_OUTPUT]
print(model)
print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
if __name__ == '__main__':
print('*' * 10)
test_regvgg()
使用如下方式进行初始化:
def init_weights(modules):
for m in modules:
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
测试结果如下:
############################################原始的模型RepVGG
RepVGGRecognizer(
(backbone): RepVGGBackbone(
(stage0): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): ReLU(inplace=True)
)
(stage1): Sequential(
(0): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(stage2): Sequential(
(0): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(4): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(5): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(stage3): Sequential(
(0): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(4): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(5): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(6): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(7): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(8): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(9): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(10): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(11): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(12): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(13): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(14): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(15): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(stage4): Sequential(
(0): Sequential(
(0): Conv2d(512, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
)
(head): GeneralHead2D(
(pool): AdaptiveAvgPool2d(output_size=(1, 1))
(dropout): Dropout(p=0.0, inplace=True)
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
)
############################################ 嵌入RepVGGBlock之后的模型
RepVGGRecognizer(
(backbone): RepVGGBackbone(
(stage0): Sequential(
(0): RepVGGBlock(
(rbr_dense): Sequential(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(stage1): Sequential(
(0): Sequential(
(0): RepVGGBlock(
(rbr_dense): Sequential(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), groups=4, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
)
(stage2): Sequential(
(0): Sequential(
(0): RepVGGBlock(
(rbr_dense): Sequential(
(conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), groups=4, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(4): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(5): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
)
(stage3): Sequential(
(0): Sequential(
(0): RepVGGBlock(
(rbr_dense): Sequential(
(conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(4): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(5): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(6): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(7): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(8): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(9): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(10): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(11): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(12): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(13): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(14): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
(15): Sequential(
(0): RepVGGBlock(
(rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rbr_dense): Sequential(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
)
(stage4): Sequential(
(0): Sequential(
(0): RepVGGBlock(
(rbr_dense): Sequential(
(conv): Conv2d(512, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rbr_1x1): Sequential(
(conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ReLU(inplace=True)
)
)
)
(head): GeneralHead2D(
(pool): AdaptiveAvgPool2d(output_size=(1, 1))
(dropout): Dropout(p=0.0, inplace=True)
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
)
############################################## 融合RepVGGBlock之后的模型
RepVGGRecognizer(
(backbone): RepVGGBackbone(
(stage0): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): ReLU(inplace=True)
)
(stage1): Sequential(
(0): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(stage2): Sequential(
(0): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(4): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(5): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(stage3): Sequential(
(0): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(2): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(3): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(4): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(5): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(6): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(7): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(8): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(9): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(10): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)3
(11): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(12): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(13): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
(14): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(1): ReLU(inplace=True)
)
(15): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(stage4): Sequential(
(0): Sequential(
(0): Conv2d(512, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
)
(head): GeneralHead2D(
(pool): AdaptiveAvgPool2d(output_size=(1, 1))
(dropout): Dropout(p=0.0, inplace=True)
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
)
tensor(0.0046, grad_fn=<SqrtBackward>) ################ 计算的结果具有较大偏差了
False
Traceback (most recent call last):
File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 151, in <module>
test_regvgg()
File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 142, in test_regvgg
assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
AssertionError
使用如下方式进行初始化:
def _init_weights(self, gamma=0.01):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, gamma)
nn.init.constant_(m.bias, gamma)
测试结果如下:
。。。
。。。
################################ 测试结果就没问题了
tensor(5.2068e-08, grad_fn=<SqrtBackward>)
True
具体现象和 @MaeThird 类似,就是小模型的误差精度小,大模型的误差精度大
我们的经验是这点浮点误差在任何任务上都观察不到任何差别。另外楼上说这种初始化造成误差小,那是自然的,因为gamma从1变成0.01了嘛……
感谢大佬的作品。 使用时,我训练小模型10MFLOPS以内部署时精度损失可以忽略,但是大模型2GFLOPS时精度就对不齐了: LOG: deploy param: stage0.rbr_reparam.weight torch.Size([64, 1, 3, 3]) -0.048573527 deploy param: stage0.rbr_reparam.bias torch.Size([64]) 0.23182523 deploy param: stage1.0.rbr_reparam.weight torch.Size([128, 64, 3, 3]) -0.0054542203 deploy param: stage1.0.rbr_reparam.bias torch.Size([128]) 1.0140312 deploy param: stage1.1.rbr_reparam.weight torch.Size([128, 64, 3, 3]) 0.0006282824 deploy param: stage1.1.rbr_reparam.bias torch.Size([128]) 0.32761782 deploy param: stage1.2.rbr_reparam.weight torch.Size([128, 128, 3, 3]) 0.0023862773 deploy param: stage1.2.rbr_reparam.bias torch.Size([128]) 0.34976208 deploy param: stage1.3.rbr_reparam.weight torch.Size([128, 64, 3, 3]) -9.027165e-05 deploy param: stage1.3.rbr_reparam.bias torch.Size([128]) 0.0063683093 deploy param: stage2.0.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -8.460902e-05 deploy param: stage2.0.rbr_reparam.bias torch.Size([256]) 0.11033552 deploy param: stage2.1.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -0.00010023986 deploy param: stage2.1.rbr_reparam.bias torch.Size([256]) -0.15826604 deploy param: stage2.2.rbr_reparam.weight torch.Size([256, 256, 3, 3]) -5.3966836e-05 deploy param: stage2.2.rbr_reparam.bias torch.Size([256]) -0.15924689 deploy param: stage2.3.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -6.7551824e-05 deploy param: stage2.3.rbr_reparam.bias torch.Size([256]) -0.37404576 deploy param: stage2.4.rbr_reparam.weight torch.Size([256, 256, 3, 3]) -0.00012947948 deploy param: stage2.4.rbr_reparam.bias torch.Size([256]) -0.6853457 deploy param: stage2.5.rbr_reparam.weight torch.Size([256, 128, 3, 3]) 7.473848e-05 deploy param: stage2.5.rbr_reparam.bias torch.Size([256]) -0.16874048 deploy param: stage3.0.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.000433887 deploy param: stage3.0.rbr_reparam.bias torch.Size([512]) 0.18602118 deploy param: stage3.1.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00048246872 deploy param: stage3.1.rbr_reparam.bias torch.Size([512]) -0.7235512 deploy param: stage3.2.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.00021061227 deploy param: stage3.2.rbr_reparam.bias torch.Size([512]) -0.5657553 deploy param: stage3.3.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.00081703335 deploy param: stage3.3.rbr_reparam.bias torch.Size([512]) -0.37847003 deploy param: stage3.4.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00033185782 deploy param: stage3.4.rbr_reparam.bias torch.Size([512]) -0.57922906 deploy param: stage3.5.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.0007206367 deploy param: stage3.5.rbr_reparam.bias torch.Size([512]) -0.56909364 deploy param: stage3.6.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.0003344199 deploy param: stage3.6.rbr_reparam.bias torch.Size([512]) -0.5628111 deploy param: stage3.7.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.00021987755 deploy param: stage3.7.rbr_reparam.bias torch.Size([512]) -0.34248477 deploy param: stage3.8.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00010127398 deploy param: stage3.8.rbr_reparam.bias torch.Size([512]) -0.5895205 deploy param: stage3.9.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.0005824505 deploy param: stage3.9.rbr_reparam.bias torch.Size([512]) -0.37577158 deploy param: stage3.10.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00012262027 deploy param: stage3.10.rbr_reparam.bias torch.Size([512]) -0.6199002 deploy param: stage3.11.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 1.503076e-06 deploy param: stage3.11.rbr_reparam.bias torch.Size([512]) -0.7054796 deploy param: stage3.12.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.0006349176 deploy param: stage3.12.rbr_reparam.bias torch.Size([512]) -1.0350925 deploy param: stage3.13.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00037807773 deploy param: stage3.13.rbr_reparam.bias torch.Size([512]) -1.1399512 deploy param: stage3.14.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.00025178236 deploy param: stage3.14.rbr_reparam.bias torch.Size([512]) -0.27695537 deploy param: stage3.15.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00074805244 deploy param: stage3.15.rbr_reparam.bias torch.Size([512]) -0.8776718 deploy param: stage4.0.rbr_reparam.weight torch.Size([1024, 512, 3, 3]) -0.00013951868 deploy param: stage4.0.rbr_reparam.bias torch.Size([1024]) 0.021552037 deploy param: linear.weight torch.Size([372, 1024]) 0.0051029953 deploy param: linear.bias torch.Size([372]) 0.17604762
打印代码: