Open yeyuanzheng177 opened 4 years ago
Same error with you, have you worked out ?
This adds apex support with level O1. But I got the following error when running it.
RuntimeError: Function _DCNv2Backward returned an invalid gradient at index 1 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor
The problem is solved. Besides following the code of https://github.com/lbin/DCNv2, I add the following three lines before return
in the _backward function of _DCNv2 in dcn_v2.py:
grad_input = grad_input.half()
grad_offset = grad_offset.half()
grad_mask = grad_mask.half()
The problem is solved. Besides following the code of https://github.com/lbin/DCNv2, I add the following three lines before
return
in the _backward function of _DCNv2 in dcn_v2.py:grad_input = grad_input.half() grad_offset = grad_offset.half() grad_mask = grad_mask.half()
Did you modified any other code besides dcn_v2.py? I modified those code but still get the same error as @yeyuanzheng177 .
me too,some one has any update?
In dcn_v2.py the "_backend.dcn_v2_forward" and "_backend.dcn_v2_backward" only expect float32 input. So if you use mix-precision(Apex/amp),you should convert float16 to float32, and convert the final output from float32 to float16. You can consult https://github.com/jasonkena/yolact/tree/amp/external/DCNv2
In dcn_v2.py the "_backend.dcn_v2_forward" and "_backend.dcn_v2_backward" only expect float32 input. So if you use mix-precision(Apex/amp),you should convert float16 to float32, and convert the final output from float32 to float16. You can consult https://github.com/jasonkena/yolact/tree/amp/external/DCNv2 @steven22tom Thank you for your hint, It works!
RuntimeError: expected scalar type Float but found Half (data at /usr/local/lib/python3.5/dist-packages/torch/include/ATen/core/TensorMethods.h:1386)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f9f2dbdd441 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f9f2dbdcd7a in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #2: float* at::Tensor::data() const + 0xcf (0x7f9f1c69fa2f in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #3: dcn_v2_cuda_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0xbc0 (0x7f9f1c6a4b50 in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #4: dcn_v2_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x8b (0x7f9f1c689c0b in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #5: + 0x1f91c (0x7f9f1c69591c in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #6: + 0x1f99e (0x7f9f1c69599e in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #7: + 0x1cdc0 (0x7f9f1c692dc0 in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)