Segmentation Fault when running ScanNet example

facebookresearch / SparseConvNet

Submanifold sparse convolutional networks

https://github.com/facebookresearch/SparseConvNet

Other

2.06k stars 334 forks source link

Segmentation Fault when running ScanNet example #209

Open rcffc opened 3 years ago

rcffc commented 3 years ago

I am running the code in a Docker container based on pytorch/pytorch:1.8.0-cuda11.1-cudnn8-devel image. torch.cuda.is_available() return True. The segmentation fault already arises in the first epoch.

Running gdb --args python /app/src/examples/ScanNet/unet.py reveals:

Thread 55 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f771e8d0700 (LWP 837)]
InputLayer_ForwardPass<float> () at /app/src/sparseconvnet/SCN/CPU/IOLayers.cpp:25
25              out_f[plane] += multiplier * in_f[plane];

You can try running my fork.

rcffc commented 3 years ago

I wonder why this piece of CPU code is executed although CUDA is available.

rcffc commented 3 years ago

Error is triggered by line 61 in unet.py: predictions=unet(batch['x'])

rcffc commented 3 years ago

I also tried a CUDA 10.0 image where I installed Pytorch 1.3 and Python 3.6 (could not use 3.3, version would conflict with Pytorch): Same issue. CUDA 10.1 also.

rcffc commented 3 years ago

The problem is the different output of running python setup.py develop within the Dockerfile and within the interactive shell: https://www.diffchecker.com/lCtW0rUo

If I run the command in the docker shell, the CUDA extension is build and the code functions. Still. I cannot wrap my head around why.