Open lijingzhu1 opened 2 years ago
@lijingzhu1 你能提供给我你的 命令吗
@lijingzhu1 你能提供给我你的 命令吗
CUDA_LAUNCH_BLOCKING=1 python tools/hoi_train.py configs/qpic/qpic_r50_150e_hico.py 加了一个CUDA_LAUNCH_BLOCKING=1,不过应该不影响任何结果
我传了一个release,你看看可以用吗。 BTW,这个仓库好久没维护了,我之前是可以跑通的,但这个仓库是不完整的,里面或许有一些bug,仅供参考哦~
Hi,
I would like to reproduce the QPIC module by using the mmhoidet. But I have two issues. For the first issue, the shape of the sub_bbox_targets and pos_gt_sub_bboxes_targets didn't match in the qpic_head.py. Below is the error: Traceback (most recent call last): File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(inputs[0], kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(args, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 607, in _get_target_single
sub_bbox_targets[pos_inds] = pos_gt_sub_bboxes_targets
RuntimeError: shape mismatch: value tensor of shape [2, 4] cannot be broadcast to indexing result of shape [0, 4]
The second issue is the gt_sub_bboxes and gt_obj_bboxes will return different lengths in some images, that's weird. Because I have checked the trainval_hico.json, the ground truth of the subject, object, and hoi category should be a pair of triples. Below is the error:
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(inputs[0], kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(args, kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 574, in _get_target_single
gt_sub_bboxes, gt_obj_bboxes)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/pseudo_sampler.py", line 47, in sample
assign_result, gt_flags)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/sampling_result.py", line 48, in init
self.pos_gt_sub_bboxes = gt_sub_bboxes[self.pos_assigned_gt_inds.long(), :]
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ac2b2a807d2 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x2319e (0x2ac2b279e19e in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x22d (0x2ac2b279fd3d in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x300f48 (0x2ac25f073f48 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x2ac2b2a69005 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #5: + 0x1ed619 (0x2ac25ef60619 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x4e4ec8 (0x2ac25f257ec8 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object ) + 0x299 (0x2ac25f2581c9 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [5,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [6,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [7,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed. Traceback (most recent call last): File "tools/hoi_train.py", line 195, in