TangYong1975 / deeplabv3-swintransformer

41 stars 11 forks source link

RuntimeError #1

Open 905390215 opened 2 years ago

905390215 commented 2 years ago

您好,我下载并运行了您的代码,感觉非常好用,在此表示非常感谢。 请问当我吧模型换成swinb的时候报 RuntimeError:Given groups=1,weight of size [48,192,1,1],expected input[4,256,56,56]to have 192 channels,but got 256 channels instead

这个应该如何修改呢?

TangYong1975 commented 2 years ago

我是上班族,周末试试,再答复你哈,你自己也可以先研究!

TangYong1975 commented 2 years ago
modeling.py 71行, inplanes = 1024 #768, swin_t, swin_s, 1024, swin_b, 1536 swin_l
TangYong1975 commented 2 years ago

原因swin-transformer有不同的scale,导致特征输出层有不同的输出channel,如上面列举,你可以根据不用类型,设置不同的值。

TangYong1975 commented 2 years ago
inplanes = 1024 #768, swin_t, swin_s, 1024, swin_b, 1536 swin_l
low_level_planes = 256 #192, swin_t, swin_s, 256, swin_b, 384 swin_l
905390215 commented 2 years ago

谢谢您,现在可以运行了,我出错的原因是由于swin因为层数加深所以需要更大的输入么?

TangYong1975 commented 2 years ago

不用类型或者scale的swin-transformer,特征长度(通道,channel)是不同的,就像我上面的答复那样! swin_t和swin_s 底层特征通道数是768,高层特征是192 swin_b 底层特征通道数是1024,高层特征是256 swin_l 底层特征通道数是1536,高层特征是384

就是下面定义heads参数决定:24对应768和192,32对应1024和256,48对应1536和384 def swin_t(num_classes=1000, hidden_dim=96, layers=(2, 2, 6, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_s(num_classes=1000, hidden_dim=96, layers=(2, 2, 18, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_b(num_classes=1000, hidden_dim=128, layers=(2, 2, 18, 2), heads=(4, 8, 16, 32), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_l(num_classes=1000, hidden_dim=192, layers=(2, 2, 18, 2), heads=(6, 12, 24, 48), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

905390215 commented 2 years ago

不用类型或者scale的swin-transformer,特征长度(通道,channel)是不同的,就像我上面的答复那样! swin_t和swin_s 底层特征通道数是768,高层特征是192 swin_b 底层特征通道数是1024,高层特征是256 swin_l 底层特征通道数是1536,高层特征是384

就是下面定义heads参数决定:24对应768和192,32对应1024和256,48对应1536和384 def swin_t(num_classes=1000, hidden_dim=96, layers=(2, 2, 6, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_s(num_classes=1000, hidden_dim=96, layers=(2, 2, 18, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_b(num_classes=1000, hidden_dim=128, layers=(2, 2, 18, 2), heads=(4, 8, 16, 32), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_l(num_classes=1000, hidden_dim=192, layers=(2, 2, 18, 2), heads=(6, 12, 24, 48), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

不用类型或者scale的swin-transformer,特征长度(通道,channel)是不同的,就像我上面的答复那样! swin_t和swin_s 底层特征通道数是768,高层特征是192 swin_b 底层特征通道数是1024,高层特征是256 swin_l 底层特征通道数是1536,高层特征是384

就是下面定义heads参数决定:24对应768和192,32对应1024和256,48对应1536和384 def swin_t(num_classes=1000, hidden_dim=96, layers=(2, 2, 6, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_s(num_classes=1000, hidden_dim=96, layers=(2, 2, 18, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_b(num_classes=1000, hidden_dim=128, layers=(2, 2, 18, 2), heads=(4, 8, 16, 32), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

def swin_l(num_classes=1000, hidden_dim=192, layers=(2, 2, 18, 2), heads=(6, 12, 24, 48), img_size=224, kwargs): return SwinTransformer(hidden_dim=hidden_dim, layers=layers, heads=heads, num_classes=num_classes, kwargs)

好的,谢谢您,我现在懂了

lemonc1014 commented 2 years ago

作者您好,您有空能否写一个在您的代码中训练自己数据集的流程步骤哇,感激不尽。

905390215 commented 2 years ago

作者您好,您有空能否写一个在您的代码中训练自己数据集的流程步骤哇,感激不尽。

我不是作者,是这个问题的提问者,作者前面写了,代码是在https://github.com/VainF/DeepLabV3Plus-Pytorch的基础上修改的,那个代码给了数据集的格式,你按那个来就行

TangYong1975 commented 2 years ago

https://github.com/VainF/DeepLabV3Plus-Pytorch

TangYong1975 commented 2 years ago

是的参考,原来作者的训练方法,一模一样的呢

lemonc1014 commented 2 years ago

是的参考,原来作者的训练方法,一模一样的呢

好的,已经跑通了,还在调试中,谢谢作者

lemonc1014 commented 2 years ago

作者您好,您有空能否写一个在您的代码中训练自己数据集的流程步骤哇,感激不尽。

我不是作者,是这个问题的提问者,作者前面写了,代码是在https://github.com/VainF/DeepLabV3Plus-Pytorch的基础上修改的,那个代码给了数据集的格式,你按那个来就行

谢谢!

lemonc1014 commented 2 years ago

是的参考,原来作者的训练方法,一模一样的呢

好的,已经跑通了,还在调试中,谢谢作者

你好,网络的预训练权重模型是怎么加的呢,我看到只有基本的骨干网络有预训练模型。另外是否只能加骨干网络的预训练权重呢。

lemonc1014 commented 2 years ago

作者您好,您有空能否写一个在您的代码中训练自己数据集的流程步骤哇,感激不尽。

我不是作者,是这个问题的提问者,作者前面写了,代码是在https://github.com/VainF/DeepLabV3Plus-Pytorch的基础上修改的,那个代码给了数据集的格式,你按那个来就行

你好,你训练自己的数据集效果怎么样呢

TangYong1975 commented 2 years ago

类似这样,加载预训练模型

model = network.deeplabv3plus_resnet50(num_classes=4, output_stride=16)

model.load_state_dict((torch.load('checkpoints/best_deeplabv3plus_resnet50_voc_os16.pth'))["model_state"])

TangYong1975 commented 2 years ago

main.py, 263行,生成model后

lemonc1014 commented 2 years ago

main.py, 263行,生成model后

谢谢作者的回复,但是这个总的网络没有相应预训练权重。只能去加载骨干网络的权重。 我在微软的swin-tranformer找到他的预训练权重,也就是作者代码里的backbone的microsoft-swintansformer预训练权重,但是发现两者大约有30个参数不同,导致没法完全对应,如: backbone.patch_embed.norm.weight torch.Size([192]) torch.Size([1536]) backbone.patch_embed.norm.bias torch.Size([192]) torch.Size([1536]) backbone.layers.0.blocks.1.attn_mask torch.Size([256, 49, 49]) torch.Size([64, 49, 49]) backbone.layers.0.downsample.norm.weight torch.Size([768]) torch.Size([1536]) backbone.layers.0.downsample.norm.bias torch.Size([768]) torch.Size([1536]) backbone.layers.1.blocks.1.attn_mask torch.Size([64, 49, 49]) torch.Size([16, 49, 49]) backbone.layers.2.blocks.1.attn_mask torch.Size([16, 49, 49]) torch.Size([4, 49, 49]) backbone.layers.2.blocks.3.attn_mask torch.Size([16, 49, 49]) torch.Size([4, 49, 49]) backbone.layers.2.blocks.5.attn_mask torch.Size([16, 49, 49]) torch.Size([4, 49, 49]) backbone.layers.2.blocks.7.attn_mask torch.Size([16, 49, 49]) torch.Size([4, 49, 49]) backbone.layers.2.blocks.9.attn_mask torch.Size([16, 49, 49]) torch.Size([4, 49, 49]) 我自己对照了一下,但暂时没看到不同的地方,所以来问一下是不是您修改了骨干网络中microsoft-swin-transformer的部分参数哇?

TangYong1975 commented 2 years ago

swin-transformer,有不同的大小:

TangYong1975 commented 2 years ago

def swin_t(num_classes, hidden_dim=96, layers=(2, 2, 6, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(embed_dim=hidden_dim, depths=layers, num_heads=heads, num_classes=num_classes, img_size=img_size, kwargs)

def swin_s(num_classes, hidden_dim=96, layers=(2, 2, 18, 2), heads=(3, 6, 12, 24), img_size=224, kwargs): return SwinTransformer(embed_dim=hidden_dim, depths=layers, num_heads=heads, num_classes=num_classes, img_size=img_size, kwargs)

def swin_b(num_classes, hidden_dim=128, layers=(2, 2, 18, 2), heads=(4, 8, 16, 32), img_size=224, kwargs): return SwinTransformer(embed_dim=hidden_dim, depths=layers, num_heads=heads, num_classes=num_classes, img_size=img_size, kwargs)

def swin_l(num_classes, hidden_dim=192, layers=(2, 2, 18, 2), heads=(6, 12, 24, 48), img_size=224, kwargs): return SwinTransformer(embed_dim=hidden_dim, depths=layers, num_heads=heads, num_classes=num_classes, img_size=img_size, kwargs)

TangYong1975 commented 2 years ago

你的预训练模型,大小要一致吧,我没有这样用过,我一般都是自己训练的模型,自己预加载

lemonc1014 commented 2 years ago

你的预训练模型,大小要一致吧,我没有这样用过,我一般都是自己训练的模型,自己预加载

是的,我选择的和下载的模型都是swin-l

lemonc1014 commented 2 years ago

你的预训练模型,大小要一致吧,我没有这样用过,我一般都是自己训练的模型,自己预加载

那骨干网络最初没有初始权重会不会效果不太好

TangYong1975 commented 2 years ago

效果取决训练数据和网络模型,初始值影响很小,顶多是收敛快点

lemonc1014 commented 2 years ago

效果取决训练数据和网络模型,初始值影响很小,顶多是收敛快点

好的谢谢作者,您自己的数据集在跑的时候尺寸大小有无要求,最后效果怎么样

TangYong1975 commented 2 years ago

分割网络,训练的时候都是cropsize,测试的时候原图,如果显存够大,可以是很大照片

xiaodongdongaaaaaa commented 1 year ago

您好,我下载并运行了您的代码,感觉非常好用,在此表示非常感谢。 请问当我吧模型换成swins的时候报 f"Input image size ({H}{W}) doesn't match model ({self.img_size[0]}{self.img_size[1]})." AssertionError: Input image size (513513) doesn't match model (448448). 我把crop的值改成448后报 C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\NLLLoss2d.cu:95: block: [0,0,0], thread: [95,0,0] Assertion t >= 0 && t < n_classes failed. return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: CUDA error: device-side assert triggered 这个应该如何修改呢?