HUSTSYJ / DA_dahazing

Domain Adaptation for Image Dehazing, CVPR2020
242 stars 40 forks source link

训练第三步的时候加载模型出现错误 #21

Open Zhang-Zhiwang opened 3 years ago

Zhang-Zhiwang commented 3 years ago

RuntimeError: Error(s) in loading state_dict for ResnetGenerator_depth: Missing key(s) in state_dict: "modelfea.1.weight", "modelfea.2.weight", "modelfea.2.bias", "modelfea.2.running_mean", "modelfea.2.running_var", "modelfea.4.weight", "modelfea.5.weight", "modelfea.5.bias", "modelfea.5.running_mean", "modelfea.5.running_var", "modelfea.7.weight", "modelfea.8.weight", "modelfea.8.bias", "modelfea.8.running_mean", "modelfea.8.running_var", "modelfea.10.conv_block.1.weight", "modelfea.10.conv_block.2.weight", "modelfea.10.conv_block.2.bias", "modelfea.10.conv_block.2.running_mean", "modelfea.10.conv_block.2.running_var", "modelfea.10.conv_block.5.weight", "modelfea.10.conv_block.6.weight", "modelfea.10.conv_block.6.bias", "modelfea.10.conv_block.6.running_mean", "modelfea.10.conv_block.6.running_var", "modelfea.11.conv_block.1.weight", "modelfea.11.conv_block.2.weight", "modelfea.11.conv_block.2.bias", "modelfea.11.conv_block.2.running_mean", "modelfea.11.conv_block.2.running_var", "modelfea.11.conv_block.5.weight", "modelfea.11.conv_block.6.weight", "modelfea.11.conv_block.6.bias", "modelfea.11.conv_block.6.running_mean", "modelfea.11.conv_block.6.running_var", "modelfea.12.conv_block.1.weight", "modelfea.12.conv_block.2.weight", "modelfea.12.conv_block.2.bias", "modelfea.12.conv_block.2.running_mean", "modelfea.12.conv_block.2.running_var", "modelfea.12.conv_block.5.weight", "modelfea.12.conv_block.6.weight", "modelfea.12.conv_block.6.bias", "modelfea.12.conv_block.6.running_mean", "modelfea.12.conv_block.6.running_var", "modelfea.13.conv_block.1.weight", "modelfea.13.conv_block.2.weight", "modelfea.13.conv_block.2.bias", "modelfea.13.conv_block.2.running_mean", "modelfea.13.conv_block.2.running_var", "modelfea.13.conv_block.5.weight", "modelfea.13.conv_block.6.weight", "modelfea.13.conv_block.6.bias", "modelfea.13.conv_block.6.running_mean", "modelfea.13.conv_block.6.running_var", "modelfea.14.conv_block.1.weight", "modelfea.14.conv_block.2.weight", "modelfea.14.conv_block.2.bias", "modelfea.14.conv_block.2.running_mean", "modelfea.14.conv_block.2.running_var", "modelfea.14.conv_block.5.weight", "modelfea.14.conv_block.6.weight", "modelfea.14.conv_block.6.bias", "modelfea.14.conv_block.6.running_mean", "modelfea.14.conv_block.6.running_var", "modelfea.15.conv_block.1.weight", "modelfea.15.conv_block.2.weight", "modelfea.15.conv_block.2.bias", "modelfea.15.conv_block.2.running_mean", "modelfea.15.conv_block.2.running_var", "modelfea.15.conv_block.5.weight", "modelfea.15.conv_block.6.weight", "modelfea.15.conv_block.6.bias", "modelfea.15.conv_block.6.running_mean", "modelfea.15.conv_block.6.running_var", "modelfea.16.conv_block.1.weight", "modelfea.16.conv_block.2.weight", "modelfea.16.conv_block.2.bias", "modelfea.16.conv_block.2.running_mean", "modelfea.16.conv_block.2.running_var", "modelfea.16.conv_block.5.weight", "modelfea.16.conv_block.6.weight", "modelfea.16.conv_block.6.bias", "modelfea.16.conv_block.6.running_mean", "modelfea.16.conv_block.6.running_var", "modelfea.17.conv_block.1.weight", "modelfea.17.conv_block.2.weight", "modelfea.17.conv_block.2.bias", "modelfea.17.conv_block.2.running_mean", "modelfea.17.conv_block.2.running_var", "modelfea.17.conv_block.5.weight", "modelfea.17.conv_block.6.weight", "modelfea.17.conv_block.6.bias", "modelfea.17.conv_block.6.running_mean", "modelfea.17.conv_block.6.running_var", "modelfea.18.conv_block.1.weight", "modelfea.18.conv_block.2.weight", "modelfea.18.conv_block.2.bias", "modelfea.18.conv_block.2.running_mean", "modelfea.18.conv_block.2.running_var", "modelfea.18.conv_block.5.weight", "modelfea.18.conv_block.6.weight", "modelfea.18.conv_block.6.bias", "modelfea.18.conv_block.6.running_mean", "modelfea.18.conv_block.6.running_var", "modelfea.19.weight", "modelfea.20.weight", "modelfea.20.bias", "modelfea.20.running_mean", "modelfea.20.running_var", "modelfea.22.weight", "modelfea.23.weight", "modelfea.23.bias", "modelfea.23.running_mean", "modelfea.23.running_var", "SFT.condition_conv.0.weight", "SFT.condition_conv.0.bias", "SFT.condition_conv.2.weight", "SFT.condition_conv.2.bias", "SFT.condition_conv.4.weight", "SFT.condition_conv.4.bias", "SFT.scale_conv.0.weight", "SFT.scale_conv.0.bias", "SFT.scale_conv.2.weight", "SFT.scale_conv.2.bias", "SFT.sift_conv.0.weight", "SFT.sift_conv.0.bias", "SFT.sift_conv.2.weight", "SFT.sift_conv.2.bias", "model2.1.weight", "model2.1.bias".

Zhang-Zhiwang commented 3 years ago

第二步类似,但是没有缺失。 我看了网上的方法,说是加载的时候把strict设置成false,但是这样出来的效果基本等于没有 请问大家有没有碰到过这个问题,是怎样解决的?

Zhang-Zhiwang commented 3 years ago

问题解决了,在训练CycleGan的时候netG_B 是resnet_9blocks ,而在SDehazing中,作者使用预训练模型初始化的时候传递的参数是which_model_netG_A,也就是resnet_9blocks_depth,两个模型不一致会出现缺少键值。 不知道作者为什么要用netG_A来进行初始化,这样难道不是应该在加载模型的函数中将strict设置为False吗?但是我试过,R2S就没什么效果了。 我的解决办法就是将define_G()函数中的which_model_netG_A改成which_model_netG_B,后面的forward也要改。 但是在最后一步联合训练的时候,作者仍然使用了resnet_9blocks_depth来装载resnet_9blocks的netG_B,不知道这样操作是为什么,希望能有大神回答一下

xiaowei-chi commented 3 years ago

+1 碰到了同样的问题

ghost commented 3 years ago

netG_B

你在训练CycleGAN的时候遇到G_A的loss收敛不了的情况嘛

Zhang-Zhiwang commented 3 years ago

netG_B

你在训练CycleGAN的时候遇到G_A的loss收敛不了的情况嘛

我两个G都是损失忽上忽下,而且一开始的几轮是损失最小的,后面反而变大了

ghost commented 3 years ago

我训练出来是这样的,这种情况正常嘛?

junkai.fan@njust.edu.cn

发件人: Jason Zhang 发送时间: 2021-03-09 21:10 收件人: HUSTSYJ/DA_dahazing 抄送: 樊俊凯; Comment 主题: Re: [HUSTSYJ/DA_dahazing] 训练第三步的时候加载模型出现错误 (#21) netG_B 你在训练CycleGAN的时候遇到G_A的loss收敛不了的情况嘛 我两个G都是损失忽上忽下,而且一开始的几轮是损失最小的,后面反而变大了 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Zhang-Zhiwang commented 3 years ago

我训练出来是这样的,这种情况正常嘛? junkai.fan@njust.edu.cn 发件人: Jason Zhang 发送时间: 2021-03-09 21:10 收件人: HUSTSYJ/DA_dahazing 抄送: 樊俊凯; Comment 主题: Re: [HUSTSYJ/DA_dahazing] 训练第三步的时候加载模型出现错误 (#21) netG_B 你在训练CycleGAN的时候遇到G_A的loss收敛不了的情况嘛 我两个G都是损失忽上忽下,而且一开始的几轮是损失最小的,后面反而变大了 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

不知道,我想做去雨的,结果域偏移过程中图像损失太多细节了,糊得一塌糊涂。。现在疯狂调参,可能数据集还要改

Zhang-Zhiwang commented 3 years ago

我训练出来是这样的,这种情况正常嘛? junkai.fan@njust.edu.cn 发件人: Jason Zhang 发送时间: 2021-03-09 21:10 收件人: HUSTSYJ/DA_dahazing 抄送: 樊俊凯; Comment 主题: Re: [HUSTSYJ/DA_dahazing] 训练第三步的时候加载模型出现错误 (#21) netG_B 你在训练CycleGAN的时候遇到G_A的loss收敛不了的情况嘛 我两个G都是损失忽上忽下,而且一开始的几轮是损失最小的,后面反而变大了 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

关于损失函数的变动你可以看看这篇文章,gan的损失和图像质量关系可能不大。 https://www.jianshu.com/p/914052bec9bc?utm_campaign

buptlj commented 3 years ago

问题解决了,在训练CycleGan的时候netG_B 是resnet_9blocks ,而在SDehazing中,作者使用预训练模型初始化的时候传递的参数是which_model_netG_A,也就是resnet_9blocks_depth,两个模型不一致会出现缺少键值。 不知道作者为什么要用netG_A来进行初始化,这样难道不是应该在加载模型的函数中将strict设置为False吗?但是我试过,R2S就没什么效果了。 我的解决办法就是将define_G()函数中的which_model_netG_A改成which_model_netG_B,后面的forward也要改。 但是在最后一步联合训练的时候,作者仍然使用了resnet_9blocks_depth来装载resnet_9blocks的netG_B,不知道这样操作是为什么,希望能有大神回答一下

按照论文中的结构描述,是要这样改。论文中从R到S是没有用到深度信息的,训练SDehazing时,初始化应该用which_model_netG_B,forward也要改,这样才和论文中一样。就是不知道作者最后的结果,是按照论文中描述训练的,还是按代码训练的。你这样修改后能复现论文中的结果吗?

vvvvvvvvvvvvvvvvvvvvvvv commented 2 years ago

我训练出来是这样的,这种情况正常嘛? junkai.fan@njust.edu.cn 发件人: Jason Zhang 发送时间: 2021-03-09 21:10 收件人: HUSTSYJ/DA_dahazing 抄送: 樊俊凯; Comment 主题: Re: [HUSTSYJ/DA_dahazing] 训练第三步的时候加载模型出现错误 (#21) netG_B 你在训练CycleGAN的时候遇到G_A的loss收敛不了的情况嘛 我两个G都是损失忽上忽下,而且一开始的几轮是损失最小的,后面反而变大了 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

关于损失函数的变动你可以看看这篇文章,gan的损失和图像质量关系可能不大。 https://www.jianshu.com/p/914052bec9bc?utm_campaign

请问你训练是正常的吗,为什么我训练显示 Traceback (most recent call last): File "train.py", line 5, in <module> from util.visualizer import Visualizer File "/content/drive/MyDrive/pytorch_test/DA_dahazing/util/visualizer.py", line 6, in <module> from . import html File "/content/drive/MyDrive/pytorch_test/DA_dahazing/util/html.py", line 1, in <module> import dominate ModuleNotFoundError: No module named 'dominate' 是缺少什么步骤吗?谢谢

hello-trouble commented 2 years ago

image, 这里的代码按照论文要求应该选择without GAN Loss 吧