csuhan / ReDet

Official code of the paper "ReDet: A Rotation-Equivariant Detector for Aerial Object Detection" (CVPR 2021)
https://redet.csuhan.com
Apache License 2.0
389 stars 79 forks source link

CUDA error: an illegal memory access was encountered in roi_align backward funcation #86

Open iamstupidd opened 3 years ago

iamstupidd commented 3 years ago

Thanks for ur work, it's pretty pretty helpful. conda environment: mmcv 0.2.16 cuda 11.1 torch 1.8.0 RTX 3080 Dataset: Fair1M for obbox detect when in config.py use roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), i can only use two gpu, when use four, then print an error, " THCudaCheck FAIL file=ReDet/mmdet/ops/roi_align/src/roi_ane=292 error=700 : an illegal memory access was encountered ". But it's ok for use roi_layer=dict(type='RoIPool', out_size=7) to fully use 4 gpu. it's so weird. Therefore i am sure there is a bug left in roi_align_kernel.cu, i am debugging it out now.
Any idea? thx

iamstupidd commented 3 years ago

what's more, there is a problem in validate map evalution, it is always zero, isn't it? do u have same problem? if yes, i had fix it by change some files in mmdet/core/evaluation/

csuhan commented 3 years ago

Line 292: https://github.com/csuhan/ReDet/blob/0b9addf3c2734659fd6ffc7824f2e659fde4419c/mmdet/ops/riroi_align/src/riroi_align_kernel.cu#L292 Please check the annotation first and make sure all bboxes with valid values (especially the field angle).

csuhan commented 3 years ago

I have not meet the bug yet. Can you share your modification?