failed in inference on non-default gpu

ThreeChen commented 5 years ago

the centernet project uses this DCNv2( pytorch 0.4 version) as I compiled DCNv2( pytorch 1.0 version) and use it in the centernet project,it also works fine. things goes well as i only run it on gpu 0

but,strange bug happened when i tried to use model inference on different gpu. say i wrote code like this: ... torch.cuda.synchronize() model = centernet_model.to( 1 ) # not on gpu 0 x = x.to( 1 ) y = model(x) torch.cuda.synchronize() ...

firstly, there would be error says" illegal memory access torch.cuda.synchronize()"

well, if i remove the all the synchronize function this code would run as usual for about 10 images, then suddenly it got a cuda runtime error says:

"GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:416"

as the author of centernet developed his project in py0.4 enviroment, i decided to change to another machine and set-up all the pytorch0.4 env, and compile DCNv2 pytorch0.4 version.

Still, everything works fine except try to inference on a non-default gpu. but this time,the error message is "argument not on same gpu".

and the error message is from dcn_v2_cuda.c:20

this problem really drived me mad, and i realy don't know where the bug is. is it here(DCNv2),or is it in the centernet project.

ThreeChen commented 5 years ago

also opened this issue in CenterNet https://github.com/xingyizhou/CenterNet/issues/42

ThreeChen commented 5 years ago

issue solved by author of CenterNet here: https://github.com/xingyizhou/CenterNet/issues/42

i believe this error has something to do with pytorch CUDA extension and pytorch default gpu.

CharlesShang / DCNv2

failed in inference on non-default gpu #16