IDEA-Research / detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
https://detrex.readthedocs.io/en/latest/
Apache License 2.0
1.9k stars 199 forks source link

[Bug] FocusDetr report min size error #308

Open icicle4 opened 9 months ago

icicle4 commented 9 months ago

When run python tools/train_net.py --config-file projects/focus_detr/configs/focus_detr_resnet/focus_detr_r101_4scale_24ep.py --num-gpus 8 where train.init_checkpoint = detectron2://ImageNetPretrained/torchvision/R-50.pkl.

It report below error, my dataset is default coco dataset.

Traceback (most recent call last):
  File "tools/train_net.py", line 313, in <module>
    args=(args,),
  File "/root/detrex/detectron2/detectron2/engine/launch.py", line 79, in launch
    daemon=False,
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 4 terminated with the following error:
Traceback (most recent call last):
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/root/detrex/detectron2/detectron2/engine/launch.py", line 126, in _distributed_worker
    main_func(*args)
  File "/root/detrex/tools/train_net.py", line 302, in main
    do_train(args, cfg)
  File "/root/detrex/tools/train_net.py", line 275, in do_train
    trainer.train(start_iter, cfg.train.max_iter)
  File "/root/detrex/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/root/detrex/tools/train_net.py", line 101, in run_step
    loss_dict = self.model(data)
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/detrex/projects/focus_detr/modeling/focus_detr.py", line 269, in forward
    loss_dict = self.criterion(output, targets, dn_meta)
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/detrex/projects/focus_detr/modeling/dn_criterion.py", line 43, in forward
    losses = super(FOCUS_DETRCriterion, self).forward(outputs, targets)
  File "/root/detrex/projects/focus_detr/modeling/two_stage_criterion.py", line 87, in forward
    class_targets = self.target_layer(outputs['srcs'], batch_boxes, batch_classes)
  File "/root/miniconda3/envs/detrex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/detrex/projects/focus_detr/modeling/foreground_supervision.py", line 75, in forward
    self.limit_range[level])
  File "/root/detrex/projects/focus_detr/modeling/foreground_supervision.py", line 129, in _gen_level_targets
    areas_min_ind = torch.min(areas, dim=-1)[1]  # [batch_size,h*w]
IndexError: min(): Expected reduction dim 2 to have non-zero size.
baojunqi commented 6 months ago

same problem bro. Have u addressed this problem?

emotionee commented 6 months ago

我也遇到了这个问题,好像是batch_sz 的数量设置有问题,请问找到解决办法了么? I have also encountered this problem. It seems that there is an issue with the quantity setting of the batch. Have you found a solution?

baojunqi commented 6 months ago

我也遇到了这个问题,好像是batch_sz 的数量设置有问题,请问找到解决办法了么? I have also encountered this problem. It seems that there is an issue with the quantity setting of the batch. Have you found a solution?

How big is your batch size? I tried to train my own dataset on DETR with a batch size of 16, it works well. However, when I tried to train Focus-DETR with a batch size of 8 on 4 A4000, it failed.

SmalWhite commented 4 months ago

have you solved the problem, I meet the same problem.