Open varagantis opened 1 year ago
It's a problem from coco in pycocotools.
coco class assign ID from 1 to N.
In the case of deformable DETR, it is a problem because it allocates from 0 to N-1.
You can try adding "tgt_ids = torch.sub(tgt_ids, 1, alpha=1, out=None)" in the matcher.py.
Hello I am facing following error when I tried to train the model using custom dataset that has 5 classes. I know the error below would majorly occur because of difference in num_classes, but not sure what is the effective rectification for this: Traceback (most recent call last): File "main.py", line 326, in
main(args)
File "main.py", line 275, in main
train_stats = train_one_epoch(
File "/home/vsrikar/engine.py", line 43, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/home/vsrikar/models/deformable_detr.py", line 342, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, *kwargs)
File "/home/vsrikar/models/matcher.py", line 87, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox),
File "/home/vsrikar/util/box_ops.py", line 59, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f89f1df01ee in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: + 0x26e61 (0x7f89f1e6ae61 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void ) + 0x257 (0x7f89f1e6fdb7 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x466858 (0x7f89f641c858 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f89f1dd77a5 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #5: + 0x362735 (0x7f89f6318735 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #6: + 0x67c6c8 (0x7f89f66326c8 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object*) + 0x2d5 (0x7f89f6632a95 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #8: python() [0x5d1908]
frame #9: python() [0x5a978d]
frame #10: python() [0x5ecd90]
frame #11: python() [0x5447b8]
frame #12: python() [0x54480a]
frame #13: python() [0x54480a]