The Shape of gt_sub_bboxes and gt_obj_bboxes didn't Match

lijingzhu1 commented 2 years ago

Hi,

I would like to reproduce the QPIC module by using the mmhoidet. But I have two issues. For the first issue, the shape of the sub_bbox_targets and pos_gt_sub_bboxes_targets didn't match in the qpic_head.py. Below is the error: Traceback (most recent call last): File "tools/hoi_train.py", line 195, in main() File "tools/hoi_train.py", line 191, in main meta=meta) File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector runner.run(data_loaders, cfg.workflow) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step losses = self(data) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func return old_func(*args, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward return self.forward_train(img, img_metas, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train gt_obj_labels, gt_verb_labels, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train return self.loss(loss_inputs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func return old_func(args, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss img_metas_list) File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply return tuple(map(list, zip(map_results))) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single img_metas) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas) File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply return tuple(map(list, zip(map_results))) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 607, in _get_target_single sub_bbox_targets[pos_inds] = pos_gt_sub_bboxes_targets RuntimeError: shape mismatch: value tensor of shape [2, 4] cannot be broadcast to indexing result of shape [0, 4]

The second issue is the gt_sub_bboxes and gt_obj_bboxes will return different lengths in some images, that's weird. Because I have checked the trainval_hico.json, the ground truth of the subject, object, and hoi category should be a pair of triples. Below is the error:

../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "tools/hoi_train.py", line 195, in main() File "tools/hoi_train.py", line 191, in main meta=meta) File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector runner.run(data_loaders, cfg.workflow) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step losses = self(data) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func return old_func(*args, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward return self.forward_train(img, img_metas, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train gt_obj_labels, gt_verb_labels, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train return self.loss(loss_inputs) File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func return old_func(args, kwargs) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss img_metas_list) File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply return tuple(map(list, zip(map_results))) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single img_metas) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas) File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply return tuple(map(list, zip(map_results))) File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 574, in _get_target_single gt_sub_bboxes, gt_obj_bboxes) File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/pseudo_sampler.py", line 47, in sample assign_result, gt_flags) File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/sampling_result.py", line 48, in init self.pos_gt_sub_bboxes = gt_sub_bboxes[self.pos_assigned_gt_inds.long(), :] RuntimeError: CUDA error: device-side assert triggered terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ac2b2a807d2 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0x2319e (0x2ac2b279e19e in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x22d (0x2ac2b279fd3d in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #3: + 0x300f48 (0x2ac25f073f48 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: c10::TensorImpl::release_resources() + 0x175 (0x2ac2b2a69005 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so) frame #5: + 0x1ed619 (0x2ac25ef60619 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x4e4ec8 (0x2ac25f257ec8 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #7: THPVariable_subclass_dealloc(_object) + 0x299 (0x2ac25f2581c9 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #25: __libc_start_main + 0xf5 (0x2ac25654e555 in /lib64/libc.so.6) /var/spool/slurmd/job13247525/slurm_script: line 19: 233669 Aborted (core dumped) CUDA_LAUNCH_BLOCKING=1 python tools/hoi_train.py configs/qpic/qpic_r50_150e_hico.py Both problems are from the qpic_head.py, but I think the problem could be a data loading problem. I trained the model after running the data_convetr file that you told us about in the INSTALL.MD. And I found you add some comments like #TODO: unfinished in the qpic_head.py. Do you have the finished version of this repository? If you could update it or share it that would be great! Thanks!

KainingYing commented 2 years ago

@lijingzhu1 你能提供给我你的命令吗

lijingzhu1 commented 2 years ago

@lijingzhu1 你能提供给我你的命令吗

CUDA_LAUNCH_BLOCKING=1 python tools/hoi_train.py configs/qpic/qpic_r50_150e_hico.py 加了一个CUDA_LAUNCH_BLOCKING=1，不过应该不影响任何结果

KainingYing commented 2 years ago

我传了一个release，你看看可以用吗。 BTW，这个仓库好久没维护了，我之前是可以跑通的，但这个仓库是不完整的，里面或许有一些bug，仅供参考哦~

KainingYing / dcnet

The Shape of gt_sub_bboxes and gt_obj_bboxes didn't Match #3