Kernel Panic using Cuda 10.2

vade commented 4 years ago

Hello - firstly, thanks for this and your great documentation. Much appreciated.

Im using Ubuntu 18.0.4 LTS, Cuda 10.2, Nvidia 4.40 drivers and a single Titan X

Ive followed the readme, installed the dependencies in a virtual envs, compiled the extensions, and am able to run the demo - however, after a few seconds the demo crashes and kernal panics the entire system.

I've attempted to edit both extension 's NVCC flags, as per the helpful note in the documentation, but to no avail.

    '-gencode', 'arch=compute_52,code=sm_52',
    '-gencode', 'arch=compute_60,code=sm_60',
    '-gencode', 'arch=compute_61,code=sm_61',
    '-gencode', 'arch=compute_70,code=sm_70',
    '-gencode', 'arch=compute_75,code=sm_75',
    '-gencode', 'arch=compute_75,code=compute_75',

However, that also kernel panics the machine.

I am able to monitor GPU memory usage right before the crash and am able to see pytorch allocating GPU memory, but It appears to go to the max, then the system dies.

Are there other specific hardware requirements for this code base?

vade commented 4 years ago

Also, im aware I can't expect you to help resolve a remote kernel panic, im just looking for any other place info to guide my debugging.

NexonSU commented 4 years ago

Seems like, it's problem with your kernel + nvidia-driver-440 + cuda-10.2. I have similar system: Ubuntu 18.04.3 (kernel 5.3.0-28) cuda 10.2.89-1 nvidia-driver-440 440.33.01-0ubuntu1 And everything is fine.

vade commented 4 years ago

Thanks, thats great to know. Ill see if I can find any other issues.

attashe commented 4 years ago

Hi, I have similar problem:

error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device Traceback (most recent call last): File "demo_MiddleBury_slowmotion.py", line 126, in y_s,offset,filter = model(torch.stack((X0, X1),dim = 0))

pytorch = 1.3.1 NVIDIA GPU = Tesla V100 CUDA Version: 10.2

MarlNox commented 4 years ago

https://github.com/baowenbo/DAIN/issues/44#issuecomment-589483416

Here you go, Just follow the colab posted on the comments and modify it according to my comment.

baowenbo / DAIN

Kernel Panic using Cuda 10.2 #43