chengdazhi / Deformable-Convolution-V2-PyTorch

Deformable ConvNets V2 (DCNv2) in PyTorch
MIT License
1.44k stars 229 forks source link

RuntimeError: Backward is not reentrant #16

Closed leon532 closed 5 years ago

leon532 commented 5 years ago

It raises RuntimeError: Backward is not reentrant when i run the test.py.

torch.Size([2, 128, 128, 128]) torch.Size([2, 128, 128, 128]) torch.Size([20, 32, 7, 7]) torch.Size([20, 32, 7, 7]) torch.Size([20, 32, 7, 7]) checking dconv im2col_step forward passed with 0.0 tensor(0., device='cuda:0', grad_fn=) dconv im2col_step backward passed with 7.450580596923828e-09 = 7.450580596923828e-09+0.0+0.0+0.0 mdconv im2col_step forward passed with 0.0 tensor(0., device='cuda:0', grad_fn=) mdconv im2col_step backward passed with 3.725290298461914e-09 0.971507, 1.943014 0.971507, 1.943014 tensor(0., device='cuda:0') dconv zero offset passed with 1.1920928955078125e-07 dconv zero offset identify passed with 0.0 tensor(0., device='cuda:0') mdconv zero offset passed with 1.7881393432617188e-07 mdconv zero offset identify passed with 0.0 check_gradient_conv: True Traceback (most recent call last): File "test.py", line 624, in check_gradient_dconv() File "test.py", line 400, in check_gradient_dconv eps=1e-3, atol=1e-3, rtol=1e-2, raise_exception=True)) File "/data/yli18/miniconda3/envs/pytorch-1.0/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 208, in gradcheck return fail_test('Backward is not reentrant, i.e., running backward with same ' File "/data/yli18/miniconda3/envs/pytorch-1.0/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 185, in fail_test raise RuntimeError(msg) RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient

Is this a serious problem? and how can i resolve it? Thanks for you time and suggestion.

xvjiarui commented 5 years ago

Sorry for the late reply. https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/1b5851abd404dc71f02ae0110af3540b6877309e/test.py#L627-L631 As in comment, it won't affect the performance. Feel free to ignore. Please reopen this issue if you have a solution. Thx

GreenTeaHua commented 5 years ago

thx

JiancongWang commented 5 years ago

I checked the code. Numerically the code should be correct. I verified that the error is very small using CUDA 9.0 on a GTX 1080 ~ 1e-16. The error is raised when calculating the gradient for input. The code uses atomicadd to calculate that since one can not possibly know how many time a pixel get used. Then the problem is the sequence of atomicadd is uncertain. Different sequence of adding will result in different value up to numerical error. The gradient on the bias/weight/offset does not have the atomicadd and thus have no such issue.

JiancongWang commented 5 years ago

That's only one possible cause of the issue though. I will double check and see if there is any other problem. For now I don't see any.

heartInsert commented 5 years ago

I checked the code. Numerically the code should be correct. I verified that the error is very small using CUDA 9.0 on a GTX 1080 ~ 1e-16. The error is raised when calculating the gradient for input. The code uses atomicadd to calculate that since one can not possibly know how many time a pixel get used. Then the problem is the sequence of atomicadd is uncertain. Different sequence of adding will result in different value up to numerical error. The gradient on the bias/weight/offset does not have the atomicadd and thus have no such issue.

You mean I can ignore this exception?