训练报错 - Githubissues

dnty commented 3 years ago

您好，我正在复现您的代码。可以正常编译训练，但是运行python tools/train.py --config configs/kd_faster_rcnn/voc_stu_faster_rcnn_r50_FGFI.py报错如下：请问mmdet里面是不是缺少什么更改过的模块？

2021-09-09 16:41:56,230 - mmdet - INFO - workflow: [('train', 1)], max: 4 epochs Traceback (most recent call last): File "tools/train.py", line 176, in main() File "tools/train.py", line 172, in main meta=meta) File "/home/dgfs/zl/DeFeat/mmdet/apis/train.py", line 476, in train_detector runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in run_iter self.model, data_batch, train_mode=train_mode, kwargs) File "/home/dgfs/zl/DeFeat/mmdet/apis/train.py", line 79, in batch_processor losses = model(data) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 42, in forward return super().forward(*inputs, *kwargs) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(inputs[0], kwargs[0]) File "/home/dgfs/anaconda3/envs/defeat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/dgfs/zl/DeFeat/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(args, kwargs) File "/home/dgfs/zl/DeFeat/mmdet/models/detectors/base.py", line 148, in forward return self.forward_train(img, img_metas, kwargs) File "/home/dgfs/zl/DeFeat/mmdet/models/detectors/two_stage_kd.py", line 185, in forward_train if 'mask-neck-one' in kd_cfg.type: AttributeError: 'NoneType' object has no attribute 'type'

ggjy commented 3 years ago

如果要用KD的话，应该是 python tools/train_kd.py

dnty commented 3 years ago

谢谢您啦！但是我用train_kd.py训练又报了另外一个错，UnboundLocalError:local variable'neck_feat_t' referenced before assignment。想再问下您two_stage_kd.py中的kd_cfg是从哪个文件导入的呢？

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年9月9日(星期四) 下午5:09 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [ggjy/DeFeat.pytorch] 训练报错 (#10)

如果要用KD的话，应该是 python tools/train_kd.py

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

ma3252788 commented 3 years ago

谢谢您啦！但是我用train_kd.py训练又报了另外一个错，UnboundLocalError:local variable'neck_feat_t' referenced before assignment。想再问下您two_stage_kd.py中的kd_cfg是从哪个文件导入的呢？ … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年9月9日(星期四) 下午5:09 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [ggjy/DeFeat.pytorch] 训练报错 (#10) 如果要用KD的话，应该是 python tools/train_kd.py — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

我也报这个错了。。求解答~~谢谢

ma3252788 commented 3 years ago

2021-09-09 23:32:05,896 - mmdet - INFO - workflow: [('train', 1)], max: 4 epochs
kd_decay rate:  1.0
Traceback (most recent call last):
  File "/home/ubuntu/bigdisk/part1/DeFeat.pytorch/tools/train_kd.py", line 204, in <module>
    main()
  File "/home/ubuntu/bigdisk/part1/DeFeat.pytorch/tools/train_kd.py", line 200, in main
    meta=meta)
  File "/home/ubuntu/bigdisk/part1/DeFeat.pytorch/mmdet/apis/train.py", line 570, in train_detector_kd
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs, kd_cfg=cfg.model.hint_adapt)
  File "/home/ubuntu/bigdisk/part1/DeFeat.pytorch/mmcv/runner/runner_kd.py", line 405, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/ubuntu/bigdisk/part1/DeFeat.pytorch/mmcv/runner/runner_kd.py", line 304, in train
    self.model, self.model_t, data_batch, train_mode=True, kd_warm=kd_warm, kd_decay=kd_decay, epoch=self._epoch, **kwargs)
  File "/home/ubuntu/bigdisk/part1/DeFeat.pytorch/mmdet/apis/train.py", line 328, in batch_processor_kd
    losskd_neck = losskd_neck + (torch.pow(neck_feat_adapt - neck_feat_t[i], 2) * 
UnboundLocalError: local variable 'neck_feat_t' referenced before assignment

@ggjy 谢谢大佬！！

ma3252788 commented 3 years ago

这个好像是config-t这个参数没设置的问题。。设置之后就好了

dnty commented 3 years ago

嗯，是的。就是config-t没有设置，我也成功训练起来了。

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年9月9日(星期四) 晚上11:56 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [ggjy/DeFeat.pytorch] 训练报错 (#10)

这个好像是config-t这个参数没设置的问题。。设置之后就好了

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

dingdingzuo commented 3 years ago

请问这个config-t怎么设置？ @dnty

ggjy / DeFeat.pytorch

训练报错 #10