Hi, Thanks for sharing,
I'm training on a custom dataset using
python tools/train_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.MAX_ITER 180000 SOLVER.STEPS "(90000, 120000)"
But after few iteration, I get this error.
2021-01-19 12:42:44,720 maskrcnn_benchmark.trainer INFO: eta: 1 day, 5:42:19 iter: 902 loss: 1.8773 (2.2307) loss_retina_cls: 0.4378 (0.6734) loss_retina_reg: 1.0318 (1.1414) loss_mask: 0.3184 (0.4159) time: 0.3497 (0.5971) data: 0.0093 (0.3164) lr: 0.005000 max mem: 3196
2021-01-19 12:42:45,114 maskrcnn_benchmark.trainer INFO: eta: 1 day, 5:41:38 iter: 903 loss: 1.7931 (2.2301) loss_retina_cls: 0.4374 (0.6731) loss_retina_reg: 1.0279 (1.1412) loss_mask: 0.3223 (0.4158) time: 0.3535 (0.5969) data: 0.0093 (0.3160) lr: 0.005000 max mem: 3196
Traceback (most recent call last):
File "tools/train_net.py", line 171, in <module>
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/eldad/retinamask-master/maskrcnn_benchmark/engine/trainer.py", line 65, in do_train
loss_dict = model(images, targets)
File "/home/eldad/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/detector/retinanet.py", line 61, in forward
(anchors, detections), detector_losses = self.rpn(images, rpn_features, targets)
File "/home/eldad/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 150, in forward
return self._forward_train(anchors, box_cls, box_regression, targets)
File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 157, in _forward_train
anchors, box_cls, box_regression, targets
File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py", line 108, in __call__
labels, regression_targets = self.prepare_targets(anchors, targets)
File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py", line 87, in prepare_targets
matched_targets.bbox, anchors_per_image.bbox
File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/box_coder.py", line 44, in encode
targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
RuntimeError: The size of tensor a (0) must match the size of tensor b (225603) at non-singleton dimension 0
(base) eldad@x580-05:~/retinamask-master$
Using maskrcnn, I get no errors. but on retina, there size mismatch error. Any idea why I have such error?
🐛 Bug
Hi, Thanks for sharing, I'm training on a custom dataset using
python tools/train_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.MAX_ITER 180000 SOLVER.STEPS "(90000, 120000)"
But after few iteration, I get this error.
Using maskrcnn, I get no errors. but on retina, there
size mismatch error
. Any idea why I have such error?Thank you