Sharpiless / yolov5-distillation-5.0

yolov5 5.0 version distillation || yolov5 5.0版本知识蒸馏,yolov5l >> yolov5s
GNU General Public License v3.0
153 stars 27 forks source link

distiltrain报维度错误,请问这个是哪里出错了 #8

Open Noisercontrollers opened 2 years ago

Noisercontrollers commented 2 years ago

运行参数:train_distillation: weights=weights/yolov5s.pt, t_weights=weights/yolov5m.pt, dist_loss=l2, temperature=20, distill=True, cfg=models/yolov5s.yaml, data=data/voc.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=32, imgsz=512, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=1, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=distill, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest 报错代码: Traceback (most recent call last): File "/home/chen/yolov5-distillation-5.0/train_distillation.py", line 663, in <module> main(opt) File "/home/chen/yolov5-distillation-5.0/train_distillation.py", line 560, in main train(opt.hyp, opt, device, callbacks) File "/home/chen/yolov5-distillation-5.0/train_distillation.py", line 346, in train dloss = compute_distillation_output_loss( File "/home/chen/yolov5-distillation-5.0/utils/loss.py", line 51, in compute_distillation_output_loss t_lcls += torch.mean(DclsLoss(pi[..., 5:], t_pi[..., 5:]) * c_obj_scale) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 528, in forward return F.mse_loss(input, target, reduction=self.reduction) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/functional.py", line 2928, in mse_loss expanded_input, expanded_target = torch.broadcast_tensors(input, target) File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/functional.py", line 74, in broadcast_tensors return _VF.broadcast_tensors(tensors) # type: ignore RuntimeError: The size of tensor a (20) must match the size of tensor b (80) at non-singleton dimension 4

wuzuowuyou commented 2 years ago

FileNotFoundError: [Errno 2] No such file or directory: 'yolov5l.pt'

你有这个模型文件吗 发我一份可以吗? 837007389@qq.com

Noisercontrollers commented 2 years ago

FileNotFoundError: [Errno 2] No such file or directory: 'yolov5l.pt'

你有这个模型文件吗 发我一份可以吗? 837007389@qq.com 你是要yolov5 5.0的yolov5l.pt吗 在yolov5作者里面有的,我帮你找了连接你自己下吧,都是coco数据集的l是640x640的 l6是1280x1280的 https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5l.pt https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5l6.pt

wuzuowuyou commented 2 years ago

多谢啊 !

又报错

raceback (most recent call last): File "train.py", line 674, in train(hyp, opt, device, tb_writer) File "train.py", line 247, in train check_anchors(model) TypeError: check_anchors() missing 1 required positional argument: 'model'

这个你解决了吗

wuzuowuyou commented 2 years ago

大家也太棒了,真的可以了,总结一下: train.py: 247行 check_anchors(model) 改成 check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)

utils/loss.py: 35行 改成 b_obj_scale = t_obj_scale.unsqueeze(-1).repeat(1, 1, 1, 1, 2) 37行 改成 if reg_norm is None

按照这个解决了 又报错 和你一样的错误 你解决了吗啊?

Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs/train/exp10
Starting training for 50 epochs...
Distillation loss type: l2

     Epoch   gpu_mem       box       obj       cls     total   distill    labels  img_size
  0%|                                                                                                                                                     | 0/2069 [00:00<?, ?it/s]/media/algo/data_1/software/anconda_install/envs/pytorch1.7.0_general/lib/python3.7/site-packages/torch/nn/modules/loss.py:446: UserWarning: Using a target size (torch.Size([8, 3, 80, 80, 80])) that is different to the input size (torch.Size([8, 3, 80, 80, 20])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
  0%|                                                                                                                                                     | 0/2069 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 675, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 377, in train
    pred, t_pred, model, dist_loss, opt.temperature, reg_norm)
  File "/media/algo/data_1/project_others/distill/yolov5-distillation-5.0/yolov5-distillation-5.0-main/utils/loss.py", line 58, in compute_distillation_output_loss
    t_pi[..., 5:]) * c_obj_scale)
  File "/media/algo/data_1/software/anconda_install/envs/pytorch1.7.0_general/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/algo/data_1/software/anconda_install/envs/pytorch1.7.0_general/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 446, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/media/algo/data_1/software/anconda_install/envs/pytorch1.7.0_general/lib/python3.7/site-packages/torch/nn/functional.py", line 2659, in mse_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/media/algo/data_1/software/anconda_install/envs/pytorch1.7.0_general/lib/python3.7/site-packages/torch/functional.py", line 71, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore
RuntimeError: The size of tensor a (20) must match the size of tensor b (80) at non-singleton dimension 4
wuzuowuyou commented 2 years ago

一个voc的20 一个coco的80

wuzuowuyou commented 2 years ago

解决了 https://github.com/ultralytics/yolov5/releases/yolov5m-VOC.pt 到这里下载voc的pt就可以了