在CWRU数据集上报错

yang301301 commented 2 years ago

当我在用CWRU数据集上由0到2迁移时，在middle epoch时报错： Traceback (most recent call last): File "E:\UDTL-master\train_advanced.py", line 88, in trainer.train() File "E:\UDTL-master\utils\train_utils_combines.py", line 286, in train distance_loss = self.distance_loss([features.narrow(0, 0, labels.size(0)), File "E:\UDTL-master\loss\JAN.py", line 44, in JAN loss = torch.mean(XX + YY - XY - YX) RuntimeError: The size of tensor a (20) must match the size of tensor b (64) at non-singleton dimension 1 在1迁移到2时则不会报错。相应参数设置为： 04-09 11:14:17 model_name: resnet_mix_features_1d 04-09 11:14:17 data_name: CWRU 04-09 11:14:17 data_dir: E:\Data\CWRU 04-09 11:14:17 transfer_task: [[0], [2]] 04-09 11:14:17 normlizetype: mean-std 04-09 11:14:17 cuda_device: 0 04-09 11:14:17 checkpoint_dir: ./checkpoint 04-09 11:14:17 pretrained: False 04-09 11:14:17 batch_size: 64 04-09 11:14:17 num_workers: 0 04-09 11:14:17 bottleneck: True 04-09 11:14:17 bottleneck_num: 256 04-09 11:14:17 last_batch: False 04-09 11:14:17 distance_metric: True 04-09 11:14:17 distance_loss: JMMD 04-09 11:14:17 trade_off_distance: Step 04-09 11:14:17 lam_distance: 1 04-09 11:14:17 domain_adversarial: True 04-09 11:14:17 adversarial_loss: CDA 04-09 11:14:17 hidden_size: 1024 04-09 11:14:17 trade_off_adversarial: Step 04-09 11:14:17 lam_adversarial: 1 04-09 11:14:17 opt: adam 04-09 11:14:17 lr: 0.001 04-09 11:14:17 momentum: 0.9 04-09 11:14:17 weight_decay: 1e-05 04-09 11:14:17 lr_scheduler: step 04-09 11:14:17 gamma: 0.1 04-09 11:14:17 steps: 150, 250 04-09 11:14:17 middle_epoch: 5 04-09 11:14:17 max_epoch: 300 04-09 11:14:17 print_step: 50 04-09 11:14:17 using 1 gpus 麻烦赵老师有空帮我解答一下。非常感谢

ZhaoZhibin commented 2 years ago

当我在用CWRU数据集上由0到2迁移时，在middle epoch时报错： Traceback (most recent call last): File "E:\UDTL-master\train_advanced.py", line 88, in trainer.train() File "E:\UDTL-master\utils\train_utils_combines.py", line 286, in train distance_loss = self.distance_loss([features.narrow(0, 0, labels.size(0)), File "E:\UDTL-master\loss\JAN.py", line 44, in JAN loss = torch.mean(XX + YY - XY - YX) RuntimeError: The size of tensor a (20) must match the size of tensor b (64) at non-singleton dimension 1 在1迁移到2时则不会报错。相应参数设置为： 04-09 11:14:17 model_name: resnet_mix_features_1d 04-09 11:14:17 data_name: CWRU 04-09 11:14:17 data_dir: E:\Data\CWRU 04-09 11:14:17 transfer_task: [[0], [2]] 04-09 11:14:17 normlizetype: mean-std 04-09 11:14:17 cuda_device: 0 04-09 11:14:17 checkpoint_dir: ./checkpoint 04-09 11:14:17 pretrained: False 04-09 11:14:17 batch_size: 64 04-09 11:14:17 num_workers: 0 04-09 11:14:17 bottleneck: True 04-09 11:14:17 bottleneck_num: 256 04-09 11:14:17 last_batch: False 04-09 11:14:17 distance_metric: True 04-09 11:14:17 distance_loss: JMMD 04-09 11:14:17 trade_off_distance: Step 04-09 11:14:17 lam_distance: 1 04-09 11:14:17 domain_adversarial: True 04-09 11:14:17 adversarial_loss: CDA 04-09 11:14:17 hidden_size: 1024 04-09 11:14:17 trade_off_adversarial: Step 04-09 11:14:17 lam_adversarial: 1 04-09 11:14:17 opt: adam 04-09 11:14:17 lr: 0.001 04-09 11:14:17 momentum: 0.9 04-09 11:14:17 weight_decay: 1e-05 04-09 11:14:17 lr_scheduler: step 04-09 11:14:17 gamma: 0.1 04-09 11:14:17 steps: 150, 250 04-09 11:14:17 middle_epoch: 5 04-09 11:14:17 max_epoch: 300 04-09 11:14:17 print_step: 50 04-09 11:14:17 using 1 gpus 麻烦赵老师有空帮我解答一下。非常感谢

从问题来看应该是你同时使用了距离和对抗，因为距离的时候我们是丢掉最后一个batch的，不然维度不匹配没法计算距离，对抗的话不用丢弃最后一个batch。所以有可能这里面出现了矛盾，你可以仔细看一下。

yang301301 commented 2 years ago

好的，谢谢赵老师的帮助。

------------------ 原始邮件 ------------------ 发件人: "ZhaoZhibin/UDTL" @.>; 发送时间: 2022年4月10日(星期天) 中午1:47 @.>; @.**@.>; 主题: Re: [ZhaoZhibin/UDTL] 在CWRU数据集上报错 (Issue #10)

当我在用CWRU数据集上由0到2迁移时，在middle epoch时报错： Traceback (most recent call last): File "E:\UDTL-master\train_advanced.py", line 88, in trainer.train() File "E:\UDTL-master\utils\train_utils_combines.py", line 286, in train distance_loss = self.distance_loss([features.narrow(0, 0, labels.size(0)), File "E:\UDTL-master\loss\JAN.py", line 44, in JAN loss = torch.mean(XX + YY - XY - YX) RuntimeError: The size of tensor a (20) must match the size of tensor b (64) at non-singleton dimension 1 在1迁移到2时则不会报错。相应参数设置为： 04-09 11:14:17 model_name: resnet_mix_features_1d 04-09 11:14:17 data_name: CWRU 04-09 11:14:17 data_dir: E:\Data\CWRU 04-09 11:14:17 transfer_task: [[0], [2]] 04-09 11:14:17 normlizetype: mean-std 04-09 11:14:17 cuda_device: 0 04-09 11:14:17 checkpoint_dir: ./checkpoint 04-09 11:14:17 pretrained: False 04-09 11:14:17 batch_size: 64 04-09 11:14:17 num_workers: 0 04-09 11:14:17 bottleneck: True 04-09 11:14:17 bottleneck_num: 256 04-09 11:14:17 last_batch: False 04-09 11:14:17 distance_metric: True 04-09 11:14:17 distance_loss: JMMD 04-09 11:14:17 trade_off_distance: Step 04-09 11:14:17 lam_distance: 1 04-09 11:14:17 domain_adversarial: True 04-09 11:14:17 adversarial_loss: CDA 04-09 11:14:17 hidden_size: 1024 04-09 11:14:17 trade_off_adversarial: Step 04-09 11:14:17 lam_adversarial: 1 04-09 11:14:17 opt: adam 04-09 11:14:17 lr: 0.001 04-09 11:14:17 momentum: 0.9 04-09 11:14:17 weight_decay: 1e-05 04-09 11:14:17 lr_scheduler: step 04-09 11:14:17 gamma: 0.1 04-09 11:14:17 steps: 150, 250 04-09 11:14:17 middle_epoch: 5 04-09 11:14:17 max_epoch: 300 04-09 11:14:17 print_step: 50 04-09 11:14:17 using 1 gpus 麻烦赵老师有空帮我解答一下。非常感谢

从问题来看应该是你同时使用了距离和对抗，因为距离的时候我们是丢掉最后一个batch的，不然维度不匹配没法计算距离，对抗的话不用丢弃最后一个batch。所以有可能这里面出现了矛盾，你可以仔细看一下。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

seamoonlight-YBY commented 1 year ago

感谢感谢

ZhaoZhibin / UDTL

在CWRU数据集上报错 #10