chengyangfu / retinamask

RetinaMask
MIT License
339 stars 52 forks source link

RuntimeError: The size of tensor a (0) must match the size of tensor b (225603) at non-singleton dimension 0 #27

Open jpainam opened 3 years ago

jpainam commented 3 years ago

🐛 Bug

Hi, Thanks for sharing, I'm training on a custom dataset using python tools/train_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.MAX_ITER 180000 SOLVER.STEPS "(90000, 120000)"

But after few iteration, I get this error.

2021-01-19 12:42:44,720 maskrcnn_benchmark.trainer INFO: eta: 1 day, 5:42:19  iter: 902  loss: 1.8773 (2.2307)  loss_retina_cls: 0.4378 (0.6734)  loss_retina_reg: 1.0318 (1.1414)  loss_mask: 0.3184 (0.4159)  time: 0.3497 (0.5971)  data: 0.0093 (0.3164)  lr: 0.005000  max mem: 3196
2021-01-19 12:42:45,114 maskrcnn_benchmark.trainer INFO: eta: 1 day, 5:41:38  iter: 903  loss: 1.7931 (2.2301)  loss_retina_cls: 0.4374 (0.6731)  loss_retina_reg: 1.0279 (1.1412)  loss_mask: 0.3223 (0.4158)  time: 0.3535 (0.5969)  data: 0.0093 (0.3160)  lr: 0.005000  max mem: 3196
Traceback (most recent call last):
  File "tools/train_net.py", line 171, in <module>
    main()
  File "tools/train_net.py", line 164, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/engine/trainer.py", line 65, in do_train
    loss_dict = model(images, targets)
  File "/home/eldad/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/detector/retinanet.py", line 61, in forward
    (anchors, detections), detector_losses = self.rpn(images, rpn_features, targets)
  File "/home/eldad/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 150, in forward
    return self._forward_train(anchors, box_cls, box_regression, targets)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 157, in _forward_train
    anchors, box_cls, box_regression, targets
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py", line 108, in __call__
    labels, regression_targets = self.prepare_targets(anchors, targets)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py", line 87, in prepare_targets
    matched_targets.bbox, anchors_per_image.bbox
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/box_coder.py", line 44, in encode
    targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
RuntimeError: The size of tensor a (0) must match the size of tensor b (225603) at non-singleton dimension 0
(base) eldad@x580-05:~/retinamask-master$

Using maskrcnn, I get no errors. but on retina, there size mismatch error. Any idea why I have such error?

Thank you