facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.3k stars 2.49k forks source link

Docker Runtime Error: Not Compiled with GPU support #167

Open archdyn opened 5 years ago

archdyn commented 5 years ago

❓ Questions and Help

Hello,

i have a strange Problem with the Docker Image. When I build the Docker Image given the instructions in INSTALL.md and if I then try training on the coco2014 dataset with the command below I get RuntimeError: Not compiled with GPU support(nms at ./maskrcnn_benchmakr/csrc/nms.h:22)

nvidia-docker run --shm-size=8gb -v /home/archdyn/Datasets/coco:/maskrcnn-benchmark/datasets/coco maskrcnn-benchmark python /maskrcnn-benchmark/tools/train_net.py --config-file "/maskrcnn-benchmark/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

But whhen I change the Dockerfile and comment the line python setup.py build develop before WORKDIR /maskrcnn-benchmark out and then execute the line python setup.py build develop inside my built docker container i can train without problems.

My Environment when running the Docker Container:

2018-11-17 20:03:13,889 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2018-11-17 20:03:15,634 maskrcnn_benchmark INFO: 
PyTorch version: 1.0.0.dev20181116
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: GeForce GTX 850M
Nvidia driver version: 410.73
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch-nightly           1.0.0.dev20181116 py3.6_cuda9.0.176_cudnn7.1.2_0    pytorch
        Pillow (5.3.0)

Does somebody know why this problem happens?

denis-sumin commented 5 years ago

The nvidia (or any other) runtime is not available during the build stage, that is why torch.cuda.is_available() will always result in False (https://github.com/NVIDIA/nvidia-docker/issues/595 for example).

So, the proposed workaround (FORCE_CUDA) is the correct way to handle it.

obendidi commented 5 years ago

@denis-sumin @miguelvr @fmassa I've opened a PR with the FORCE_CUDA flag as an option

denis-sumin commented 5 years ago

By the way, I've tested that such workaround works fine.

miguelvr commented 5 years ago

@fmassa close this?

IssamLaradji commented 5 years ago

@miguelvr I don't think this is solved yet. I still need to do tricks to get it to work with GPU in a docker.

kangyisheng123456 commented 4 years ago

您好@archdyn 我遇到了和您一样的问题RuntimeError: Not compiled with GPU support (nms at /algo_code/maskrcnn_benchmark/csrc/nms.h:22),如何解决? 顺便说一句,我用docker代替nvidia-docker。 i would like to ask ,how do you solve it?

JaledMC commented 4 years ago

@IssamLaradji

This is and old thread, but for anybody that encounter this problem (GPU installation of the repo inside docker) and FORCE_CUDA doesn't work, maybe this issue can helps. With these changes, (and preventing latest pytorch and torchvision installation), I made a dockerfile which works.

Happy coding!