chensong1995 / HybridPose

HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)
MIT License
411 stars 64 forks source link

"PTX JIT compilation failed" during test phase #21

Open blo85 opened 4 years ago

blo85 commented 4 years ago

Hello,

First of all, thank you for sharing your work.

I am writing because I have managed to deploy and train Hybridpose on my local machine, but when I launch it via remote Pycharm on a shared GPU server, I get the following message during test phase (in learning phase there is no problem):

GPUassert: a PTX JIT compilation failed ./src/ransac_voting_kernel.cu 83

On both machines the code is obviously the same and the anaconda environment is a copy from my local machine (through a .yml file).

The only difference I see is that nvidia drivers are different on my local machine (v440.33.01) and on the remote machine (v418.87.00),but I understand that there should be no problem.

Any idea where the problem might be?

Thank you very much!

chensong1995 commented 4 years ago

Hello blo85,

Did you recompile dynamic libraries after moving the code to the remote server?

blo85 commented 4 years ago

Thank you for your response @chensong1995

Yes, I've recompiled the libraries on the server. And there's one thing I thought was not important and maybe it is. When I run: $ python setup.py build_ext --inplace

I receive a warning?? regarding nvcc, that I don't get on my local machine : which: no nvcc in ("here my PATH env variable values") running build_ext copying build/lib.linux-x86_64-3.7/ransac_voting.cpython-37m-x86_64-linux-gnu.so ->

But as you can see the process continues and I don't get any errors. I also don't have NVCC defined in the path of my local machine, so I didn't give it much importance either... Can the problem come from there?

Thank you!

BR

chensong1995 commented 4 years ago

Hello BR,

The problem likely comes from here. nvcc is usually located in the bin directory of CUDA installation path (e.g. /opt/cuda-10.0/bin on our department server). Please consider adding it to the $PATH.

I hope this helps.

blo85 commented 4 years ago

Thanks again @chensong1995

I have added nvcc /bin path to my PATH env variable, and I'm no longer getting the "error" message when I compile the ransac library.

So I have recompiled both libraries just in case, but when I launch the training again, I keep getting the same error during test phase... GPUassert: a PTX JIT compilation failed ./src/ransac_voting_kernel.cu 83

Ideas?

Thank you

chensong1995 commented 4 years ago

My best guess is that the CUDA or Nvidia driver used at the compile time is different from that used at the test time.