Open archdyn opened 5 years ago
cc @miguelvr do you know what this might be?
@archdyn can you paste your build command?
Ahhh, I see that your GPU is a GTX 850M... I'm not sure if the pytorch 1.0 wheel is compatible with that GPU. @fmassa might be able to confirm that
It should be fine I think.
Also, pytorch installation seems to be fine, but the error appears when he tries to install maskrcnn-benchmark
Well, now I'm running the exact same docker file as before and I'm getting another runtime error:
Traceback (most recent call last):
File "tools/train_net.py", line 16, in <module>
from maskrcnn_benchmark.engine.inference import inference
File "/maskrcnn-benchmark/maskrcnn_benchmark/engine/inference.py", line 20, in <module>
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in <module>
from maskrcnn_benchmark.layers import nms as _box_nms
File "/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 8, in <module>
from .nms import nms
File "/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in <module>
Traceback (most recent call last):
from maskrcnn_benchmark import _C
File "tools/train_net.py", line 16, in <module>
ImportError: /maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at4cuda6detail17CUDAStream_streamEP19CUDAStreamInternals
The dockerfile install the libraries from their respective github master... Maybe something changed since?
Well, now I'm running the exact same docker file as before and I'm getting another runtime error:
Traceback (most recent call last): File "tools/train_net.py", line 16, in <module> from maskrcnn_benchmark.engine.inference import inference File "/maskrcnn-benchmark/maskrcnn_benchmark/engine/inference.py", line 20, in <module> from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou File "/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in <module> from maskrcnn_benchmark.layers import nms as _box_nms File "/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 8, in <module> from .nms import nms File "/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in <module> Traceback (most recent call last): from maskrcnn_benchmark import _C File "tools/train_net.py", line 16, in <module> ImportError: /maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at4cuda6detail17CUDAStream_streamEP19CUDAStreamInternals
The dockerfile install the libraries from their respective github master... Maybe something changed since?
Nevermind this, I was mapping the csrc/
folder by mistake...
It is working fine for me now...
@archdyn what is your local CUDA version?
Please run
nvcc --version
and check if it matches the version that appears in the docker... These have to match
I have a second PC at my disposal where this problem also occurs. On the second Pc I have a Nvidia V100. The Output of the second PC is below.
nvcc -V:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
nvidia-smi:
Mon Nov 19 13:22:50 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:60:00.0 Off | Off |
| N/A 32C P0 37W / 250W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
command to build docker image:
nvidia-docker build -t maskrcnn-benchmark --build-arg CUDA=9.1 --build-arg CUDNN=7 docker/
PS: The order of the build command in INSTALL.md seems to be wrong. In the INSTALL.md the order is the following but this gives an error:
nvidia-docker build -t --build-arg CUDA=9.2 --build-arg CUDNN=7 maskrcnn-benchmark docker/
When building maskrcnn-benchmark the following Warning occurs but the build process continues and succeeds: No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
But when I then train with following command I get the Runtime Error:
nvidia-docker run --shm-size=8gb -v /home/archdyn/Datasets/coco:/maskrcnn-benchmark/datasets/coco maskrcnn-benchmark python /maskrcnn-benchmark/tools/train_net.py --config-file "/maskrcnn-benchmark/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
The only way to train and prevent the Runtime Error is to modify the Dockerfile and build it like:
ARG CUDA="9.0"
ARG CUDNN="7"
FROM nvidia/cuda:${CUDA}-cudnn${CUDNN}-devel-ubuntu16.04
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
# install basics
RUN apt-get update -y \
&& apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++
# Install Miniconda
RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& chmod +x /miniconda.sh \
&& /miniconda.sh -b -p /miniconda \
&& rm /miniconda.sh
ENV PATH=/miniconda/bin:$PATH
# Create a Python 3.6 environment
RUN /miniconda/bin/conda install -y conda-build \
&& /miniconda/bin/conda create -y --name py36 python=3.6.7 \
&& /miniconda/bin/conda clean -ya
ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
RUN conda install -y ipython
RUN pip install ninja yacs cython matplotlib
# Install PyTorch 1.0 Nightly
RUN conda install -y pytorch-nightly -c pytorch && conda clean -ya
# Install TorchVision master
RUN git clone https://github.com/pytorch/vision.git \
&& cd vision \
&& python setup.py install
# install pycocotools
RUN git clone https://github.com/cocodataset/cocoapi.git \
&& cd cocoapi/PythonAPI \
&& python setup.py build_ext install
# install PyTorch Detection
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git \
WORKDIR /maskrcnn-benchmark
Then after the build I have to go inside the docker container:
nvidia-docker run --rm -it maskrcnn-benchmark bash
And inside the docker container I build maskrcnn-benchmark without problems:
python setup.py build develop
I then have to commit this modified docker container so that I have a Docker Image that can always be started:
docker commit [Container ID] maskrcnn-benchmark:working
After all these steps I can train without problems with:
nvidia-docker run --shm-size=8gb -v /home/archdyn/Datasets/coco:/maskrcnn-benchmark/datasets/coco maskrcnn-benchmark:working python /maskrcnn-benchmark/tools/train_net.py --config-file "/maskrcnn-benchmark/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
the only possible difference that I see is that you are using CUDA 9.1... For some reason nvidia-docker is not detecting your CUDA installation...
is CUDA_HOME
set?
Setting CUDA_HOME locally and inside the Dockerfile didnt help.
Maybe the Runtime Error occurs because my local CUDA is 9.1, Docker CUDA is 9.1 but conda install pytorch-nightly -c pytorch
is using CUDA 9.0. It seems that pytorch installation with conda comes with its own CUDA.
https://discuss.pytorch.org/t/is-it-possible-to-link-conda-binary-to-cuda-in-non-standard-location/6863 https://discuss.pytorch.org/t/pytorch-and-cuda-9-1/13126/14
In the build_anaconda.sh script for pytorch the versions 8.0, 9.0 and 9.1 are supported as it seems: https://github.com/pytorch/pytorch/blob/master/scripts/build_anaconda.sh
I will try tomorrow to change the line conda install pytorch-nightly -c pytorch
to conda install pytorch-nightly cuda91 -c pytorch
and see what happens.
Edit: I changed the pytorch installation today to conda install pytorch-nightly cuda91 -c pytorch
so my local CUDA is 9.1, Docker CUDA is 9.1 and the pytorch installation should have CUDA9.1 as well. I also set CUDA_HOME locally and inside the Docker Container. I am still getting the Runtime Error and I dont know why nvidia-docker doesnt find my CUDA Runtime.
Since it works when I build maskrcnn-benchmark inside my Docker Container and then commit this Docker Container it doesnt matter that much.
hi @archdyn
i met the same problem as u did RuntimeError: Not compiled with GPU support (nms at /algo_code/maskrcnn_benchmark/csrc/nms.h:22)
, how do u solved it?
btw, i use docker
instead of nvidia-docker
.
thanks.
@zimenglan-sysu-512 the dockerfile that we provide only works with nvidia-docker
, and won't work with docker
.
hi @fmassa
i use nvidia-docker
to build the image, and the then use nvidia-docker
to run, it still encounters this problem. i found that when run the command sudo python3.6 setup.py build develop
, the torch.cuda.is_available()
is false.
do u have any idea to solve it?
thanks
You could try what worked for me. Take the following Dockerfile and build it with nvidia-docker. The Dockerfile has just 2 lines removed under install pytorch detection.
ARG CUDA="9.0"
ARG CUDNN="7"
FROM nvidia/cuda:${CUDA}-cudnn${CUDNN}-devel-ubuntu16.04
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
# install basics
RUN apt-get update -y \
&& apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++
# Install Miniconda
RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& chmod +x /miniconda.sh \
&& /miniconda.sh -b -p /miniconda \
&& rm /miniconda.sh
ENV PATH=/miniconda/bin:$PATH
# Create a Python 3.6 environment
RUN /miniconda/bin/conda install -y conda-build \
&& /miniconda/bin/conda create -y --name py36 python=3.6.7 \
&& /miniconda/bin/conda clean -ya
ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
RUN conda install -y ipython
RUN pip install ninja yacs cython matplotlib
# Install PyTorch 1.0 Nightly
RUN conda install -y pytorch-nightly -c pytorch && conda clean -ya
# Install TorchVision master
RUN git clone https://github.com/pytorch/vision.git \
&& cd vision \
&& python setup.py install
# install pycocotools
RUN git clone https://github.com/cocodataset/cocoapi.git \
&& cd cocoapi/PythonAPI \
&& python setup.py build_ext install
# install PyTorch Detection
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
WORKDIR /maskrcnn-benchmark
After that go into the docker container with the following command:
nvidia-docker run --rm -it maskrcnn-benchmark bash
Now inside the docke container execute this command:
python setup.py build develop
Now you have to exit the docker container with CTRL + p
and after that CTRL + q
. This should get you out of the docker container without stopping it. Alternatively you could just open a new console.
From the console now execute this:
docker commit <CONTAINER ID> maskrcnn-benchmark
After all this you should have a working docker image without this Runtime Error. At least for me this worked.
@archdyn that's super weird because you are just installing maskrcnn later...
@fmassa I noticed this line in the dockerfile:
echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
I didn't add it... Can you explain me its purpuse?
@archdyn @zimenglan-sysu-512 please make sure you have the correct version of docker-ce and nvidia docker installed: https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#which-docker-packages-are-supported
hi @archdyn
i find that if go into the built image and then run python3.6 setup.py build develop
, it can successfully build the maskrcnn-benchmark. it is so weird.
hi @miguelvr i use cuda-9.2 + cudnn-v7.4 + nvidia-driver-396, and use nvidia-docker-18.09
hi @miguelvr i use cuda-9.2 + cudnn-v7.4 + nvidia-driver-396, and use nvidia-docker-18.09
I think this is only happening to people that are using versions of CUDA different than the defaults in the dockerfile... I wasn't able to test with those as I don't have them installed in my machine.
Anyway, can you run this
nvidia-docker run -it nvidia/cuda:${CUDA}-cudnn${CUDNN}-devel-ubuntu16.04 nvidia-smi
(replace ${CUDA} and ${CUDNN} with the build args that you used)
and check if it returns all your GPU information
hi @miguelvr i run the command to ouput this as below:
sudo nvidia-docker run -it nvidia/cuda:9.2-cudnn7-devel-ubuntu16.04 nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:04:00.0 Off | N/A |
| 22% 45C P0 72W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 00000000:05:00.0 Off | N/A |
| 22% 53C P0 77W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 00000000:08:00.0 Off | N/A |
| 22% 53C P0 71W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 00000000:09:00.0 Off | N/A |
| 22% 53C P0 75W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX TIT... Off | 00000000:84:00.0 Off | N/A |
| 22% 51C P0 74W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX TIT... Off | 00000000:85:00.0 Off | N/A |
| 22% 49C P0 73W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX TIT... Off | 00000000:88:00.0 Off | N/A |
| 22% 47C P0 72W / 250W | 0MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX TIT... Off | 00000000:89:00.0 Off | N/A |
| 22% 46C P0 67W / 250W | 0MiB / 12212MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
@miguelvr those lines were added in https://github.com/facebookresearch/maskrcnn-benchmark/pull/165 Maybe @keineahnung2345 can add a bit more details on them
@miguelvr This line is used to avoid the warning given by debconf. Ref: https://github.com/phusion/baseimage-docker/issues/58.
hi @archdyn @fmassa @miguelvr
when i use the nvidia-docker
to build the image, torch.cuda.is_available()
returns False
. and then after building, i use nvidia-docker
to run the image, torch.cuda.is_available()
returns True
. it is so weird.
with the help of my friend to debug it, finally we find that if we change the line as below:
# if torch.cuda.is_available() and CUDA_HOME is not None:
if CUDA_HOME is not None:
it can solve the problem "RuntimeError: Not compiled with GPU support (nms at /algo_code/maskrcnn_benchmark/csrc/nms.h:22)", when run the image.
although i have no idea why torch.cuda.is_available()
return False
in building time.
thanks.
I have no idea either... but if I remove the if torch.cuda.is_available()
, I'm afraid that compilation could fail if the user has for some reason installed a few things from CUDA in their system (but not a nvcc)
Hi, any updates on a potential fix ? For now my workaround is to build the project each time at run-time :
nvidia-docker run --ipc=host \
-v /home/archdyn/Datasets/coco:/maskrcnn-benchmark/datasets/coco \
--rm -it maskrcnn-benchmark:latest \
bash -c "python setup.py build develop && \
python tools/train_net.py --config-file "config.yaml"
@bendidi a better workaround would be to build the image, run a container with bash and install the packages. Then leave the container without exiting it, and create an image from that container with docker commit
I agree with you @miguelvr , but what I'm trying to make is a docker file that I can deploy automatically on multiple machines and start training/inference automatically and not have to do the manip of docker commit
manually each time, another possible workaround would be to :
Add an option in python setup.py build develop
to force install for cuda even if torch.cuda.is_available()
returns False
, or an env var maybe like FORCE_CUDA=1 python setup.py build develop
Will this actually solve the problem for you? Maybe you could try applying a patch to the repo after cloning it in your Dockerfile as a temporary solution?
The weird part is that it works for CUDA 9.0 but not for CUDA 9.1 or 9.2
@miguelvr it doesn't work for me with cuda 9.0, tested with :
FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
# install basics
RUN apt-get update -y \
&& apt-get install -y apt-utils build-essential \
git curl ca-certificates \
bzip2 cmake tree htop \
bmon iotop g++ libglib2.0-0
# Install Miniconda
RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& chmod +x /miniconda.sh \
&& /miniconda.sh -b -p /miniconda \
&& rm /miniconda.sh
ENV PATH=/miniconda/bin:$PATH
# Create a Python 3.6 environment
RUN /miniconda/bin/conda install -y conda-build \
&& /miniconda/bin/conda create -y --name py36 python=3.6.7 pip \
&& /miniconda/bin/conda clean -ya
ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
# Install PyTorch 1.0 Nightly
RUN pip install ninja yacs matplotlib opencv-python==3.2.0.6
RUN pip install pyyaml Cython==0.29.1 numpy==1.15
RUN pip install pycocotools==2.0.0
RUN conda install -y pytorch-nightly=1.0.0.dev20181127 -c pytorch && conda clean -ya
RUN python -c "import torch; print(torch.cuda.is_available())"
and result is :
Step 24/24 : RUN python -c "import torch; print(torch.cuda.is_available())"
---> Running in b8a5b5f49b01
False
@fmassa yes I'll try that , thanks !
@bendidi the CUDA version in your machine must match the CUDA version in the docker. If you don't have CUDA 9.0 in your machine hosting the container it won't work
A temporary solution as @fmassa said :
# install PyTorch Detection
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git \
&& cd maskrcnn-benchmark \
&& sed -i -e 's/torch.cuda.is_available()/True/g' setup.py \
&& python setup.py build develop \
&& sed -i -e 's/True/torch.cuda.is_available()/g' setup.py
@bendidi @zimenglan-sysu-512 could you try replacing the pytorch installation line in the Dockerfile
by this:
RUN conda install -y pytorch-nightly cuda92 -c pytorch \
&& conda clean -ya
Use the CUDA version corresponding to your host machine (CUDA 9.2 in this example).
hi @miguelvr
i did use cuda92
. it still met this problem.
You could try what worked for me. Take the following Dockerfile and build it with nvidia-docker. The Dockerfile has just 2 lines removed under install pytorch detection.
ARG CUDA="9.0" ARG CUDNN="7" FROM nvidia/cuda:${CUDA}-cudnn${CUDNN}-devel-ubuntu16.04 RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections # install basics RUN apt-get update -y \ && apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++ # Install Miniconda RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \ && chmod +x /miniconda.sh \ && /miniconda.sh -b -p /miniconda \ && rm /miniconda.sh ENV PATH=/miniconda/bin:$PATH # Create a Python 3.6 environment RUN /miniconda/bin/conda install -y conda-build \ && /miniconda/bin/conda create -y --name py36 python=3.6.7 \ && /miniconda/bin/conda clean -ya ENV CONDA_DEFAULT_ENV=py36 ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV ENV PATH=$CONDA_PREFIX/bin:$PATH ENV CONDA_AUTO_UPDATE_CONDA=false RUN conda install -y ipython RUN pip install ninja yacs cython matplotlib # Install PyTorch 1.0 Nightly RUN conda install -y pytorch-nightly -c pytorch && conda clean -ya # Install TorchVision master RUN git clone https://github.com/pytorch/vision.git \ && cd vision \ && python setup.py install # install pycocotools RUN git clone https://github.com/cocodataset/cocoapi.git \ && cd cocoapi/PythonAPI \ && python setup.py build_ext install # install PyTorch Detection RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git WORKDIR /maskrcnn-benchmark
After that go into the docker container with the following command:
nvidia-docker run --rm -it maskrcnn-benchmark bash
Now inside the docke container execute this command:python setup.py build develop
Now you have to exit the docker container withCTRL + p
and after thatCTRL + q
. This should get you out of the docker container without stopping it. Alternatively you could just open a new console. From the console now execute this:docker commit <CONTAINER ID> maskrcnn-benchmark
After all this you should have a working docker image without this Runtime Error. At least for me this worked.
This is worked for me.. I have test.
@GuoLiuFang works for me. thanks.
any updates on this for pytorch 1.0?
@IssamLaradji do you still see an issue here?
thanks for your reply. Yes, I get
ipdb> nms(boxes, scores, nms_threshold)
*** RuntimeError: Not compiled with GPU support (nms at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/nms.h:22)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f6c6d9a3cc5 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: nms(at::Tensor const&, at::Tensor const&, float) + 0xd4 (0x7f6c5bc69404 in /maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x15717 (0x7f6c5bc75717 in /maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x1580e (0x7f6c5bc7580e in /maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x126f5 (0x7f6c5bc726f5 in /maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>
After installing it on Docker as:
RUN conda install pytorch torchvision cuda92 -c pytorch
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git \
&& cd maskrcnn-benchmark \
&& python setup.py build develop && cd
Thanks in advance!
@IssamLaradji did you try using https://github.com/facebookresearch/maskrcnn-benchmark/issues/167#issuecomment-448812589 to see if it works for you?
Unfortunately, I can't use that fix in my case :(, because I have to launch multiple experiments based on the docker to multiple machines. That is, I can't manually access a docker instance as the experiment has to run automatically. :(
I don't know why during the image creation no GPUs can be seen, this might be an issue with nvidia-docker, but I don't know more about it because I don't often use docker :-/
@IssamLaradji I had the same problem and my solution for that was to modify the docker file a little bit :
# install PyTorch Detection
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git \
&& cd maskrcnn-benchmark \
&& sed -i -e 's/torch.cuda.is_available()/True/g' setup.py \
&& python setup.py build develop \
&& sed -i -e 's/True/torch.cuda.is_available()/g' setup.py
it's a quick dirty hack , but it gets the job done :)
Thanks, but i get this with that command,
/usr/local/cuda/bin/nvcc -DWITH_CUDA -I/maskrcnn-benchmark/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.7/site-packages/torch/lib/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu -o build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
Isn't there any more error message there? If not, it would be helpful if you could jump into the docker container and try compiling it from there, so that we can know the error message.
Sorry, this is the full error, which found cuda but it didn't work :(:
Step 52/53 : RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git && cd maskrcnn-benchmark && sed -i -e 's/torch.cuda.is_available()/True/g' setup.py && CUDAHOSTCXX=/usr/bin/gcc-5 python setup.py build develop && sed -i -e 's/True/torch.cuda.is_available()/g' setup.py
---> Running in e2e733b95db4
^[[91mCloning into 'maskrcnn-benchmark'...
^[[0mNo CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark
copying maskrcnn_benchmark/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
copying maskrcnn_benchmark/config/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
copying maskrcnn_benchmark/config/defaults.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
copying maskrcnn_benchmark/config/paths_catalog.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
copying maskrcnn_benchmark/solver/build.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
copying maskrcnn_benchmark/solver/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
copying maskrcnn_benchmark/solver/lr_scheduler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/registry.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/utils.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/matcher.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/poolers.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
copying maskrcnn_benchmark/modeling/box_coder.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
copying maskrcnn_benchmark/structures/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
copying maskrcnn_benchmark/structures/bounding_box.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
copying maskrcnn_benchmark/structures/image_list.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
copying maskrcnn_benchmark/structures/segmentation_mask.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
copying maskrcnn_benchmark/structures/boxlist_ops.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/logger.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/checkpoint.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/registry.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/c2_model_loading.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/env.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/model_zoo.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/comm.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/collect_env.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/metric_logger.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/model_serialization.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/imports.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
copying maskrcnn_benchmark/utils/miscellaneous.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
copying maskrcnn_benchmark/data/build.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
copying maskrcnn_benchmark/data/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
copying maskrcnn_benchmark/data/collate_batch.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/roi_pool.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/smooth_l1_loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/misc.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/nms.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/roi_align.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/batch_norm.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
copying maskrcnn_benchmark/layers/_utils.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
copying maskrcnn_benchmark/engine/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
copying maskrcnn_benchmark/engine/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
copying maskrcnn_benchmark/engine/trainer.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
copying maskrcnn_benchmark/modeling/backbone/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
copying maskrcnn_benchmark/modeling/backbone/backbone.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
copying maskrcnn_benchmark/modeling/backbone/resnet.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
copying maskrcnn_benchmark/modeling/backbone/fpn.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
copying maskrcnn_benchmark/modeling/rpn/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
copying maskrcnn_benchmark/modeling/rpn/anchor_generator.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
copying maskrcnn_benchmark/modeling/rpn/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
copying maskrcnn_benchmark/modeling/rpn/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
copying maskrcnn_benchmark/modeling/rpn/rpn.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads
copying maskrcnn_benchmark/modeling/roi_heads/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads
copying maskrcnn_benchmark/modeling/roi_heads/roi_heads.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
copying maskrcnn_benchmark/modeling/detector/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
copying maskrcnn_benchmark/modeling/detector/generalized_rcnn.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
copying maskrcnn_benchmark/modeling/detector/detectors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
copying maskrcnn_benchmark/modeling/roi_heads/box_head/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
copying maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_predictors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
copying maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
copying maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
copying maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
copying maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
copying maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_predictors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
copying maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
copying maskrcnn_benchmark/modeling/roi_heads/mask_head/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
copying maskrcnn_benchmark/modeling/roi_heads/mask_head/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
copying maskrcnn_benchmark/modeling/roi_heads/mask_head/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
copying maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
copying maskrcnn_benchmark/data/samplers/distributed.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
copying maskrcnn_benchmark/data/samplers/iteration_based_batch_sampler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
copying maskrcnn_benchmark/data/samplers/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
copying maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
copying maskrcnn_benchmark/data/transforms/build.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
copying maskrcnn_benchmark/data/transforms/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
copying maskrcnn_benchmark/data/transforms/transforms.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
copying maskrcnn_benchmark/data/datasets/concat_dataset.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
copying maskrcnn_benchmark/data/datasets/list_dataset.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
copying maskrcnn_benchmark/data/datasets/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
copying maskrcnn_benchmark/data/datasets/voc.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
copying maskrcnn_benchmark/data/datasets/coco.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation
copying maskrcnn_benchmark/data/datasets/evaluation/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/coco
copying maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/coco
copying maskrcnn_benchmark/data/datasets/evaluation/coco/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/coco
creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/voc
copying maskrcnn_benchmark/data/datasets/evaluation/voc/voc_eval.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/voc
copying maskrcnn_benchmark/data/datasets/evaluation/voc/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/voc
running build_ext
building 'maskrcnn_benchmark._C' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/maskrcnn-benchmark
creating build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark
creating build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc
creating build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cpu
creating build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda
gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/maskrcnn-benchmark/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.7/site-packages/torch/lib/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c /maskrcnn-benchmark/maskrcnn_benchmark/csrc/vision.cpp -o build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
^[[91mcc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
^[[0mgcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/maskrcnn-benchmark/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.7/site-packages/torch/lib/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp -o build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cpu/nms_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
^[[91mcc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
^[[0mgcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/maskrcnn-benchmark/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.7/site-packages/torch/lib/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.cpp -o build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
^[[91mcc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
^[[0m/usr/local/cuda/bin/nvcc -DWITH_CUDA -I/maskrcnn-benchmark/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.7/site-packages/torch/lib/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu -o build/temp.linux-x86_64-3.7/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
^[[91merror: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
^[[0mThe push refers to repository [images.borgy.elementai.lan/issam.laradji/v1]
@IssamLaradji this line is suspicious
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda
apart from that, I don't have any other idea what it might be
I get the Not compiled with GPU support
error as well, but I can make it work using sed
-workaround from @bendidi .
My own conclusions are that we see the same problem as discussed here https://github.com/NVIDIA/nvidia-docker/issues/225 (Which seems to be working as intended? But then I don't understand how it could work in some cases here at all). Running the following docker-file outputs false
on the last statement, hence why maskrcnn-benchmark
is compiled without GPU support later. I haven't tried installing using conda
. I tried to compiled pytorch
from scratch but that didn't help either.
FROM nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04
RUN apt-get update && apt-get install -y --no-install-recommends \
vim \
git \
python3.6-dev \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
RUN rm -f /usr/bin/python && ln -s /usr/bin/python3.6 /usr/bin/python
RUN rm -f /usr/bin/pip && ln -s /usr/bin/pip3 /usr/bin/pip
RUN pip install -U pip
RUN pip install numpy
RUN pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html
RUN python -c "import torch; print(torch.cuda.is_available());"
Cool, thanks for the info and the reference @lham !
@fmassa if this is just a matter of torch.is_cuda_available()
not working, can wer simply add another flag to force a CUDA instalation and bypass it?
It should be as simple as changing a single line in setup.py:
if (torch.cuda.is_available() and CUDA_HOME is not None) or FORCE_CUDA:
extension = CUDAExtension
sources += source_cuda
define_macros += [("WITH_CUDA", None)]
extra_compile_args["nvcc"] = [
"-DCUDA_HAS_FP16=1",
"-D__CUDA_NO_HALF_OPERATORS__",
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]
❓ Questions and Help
Hello,
i have a strange Problem with the Docker Image. When I build the Docker Image given the instructions in INSTALL.md and if I then try training on the coco2014 dataset with the command below I get RuntimeError: Not compiled with GPU support(nms at ./maskrcnn_benchmakr/csrc/nms.h:22)
But whhen I change the Dockerfile and comment the line
python setup.py build develop
beforeWORKDIR /maskrcnn-benchmark
out and then execute the linepython setup.py build develop
inside my built docker container i can train without problems.My Environment when running the Docker Container:
Does somebody know why this problem happens?