facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Apache License 2.0
30.55k stars 7.49k forks source link

CUDA error: no kernel image is available for execution on the device #252

Closed ruodingt closed 5 years ago

ruodingt commented 5 years ago

Hi, mate

I had some issue when running a customised script on my own training dataset.

The environment is build based on Dockerfile provided in the repo. I only add a ssh server to the docker file and nothing else is changed. I think I may miss something, making It crashes at the ROI alignment.

The details are attached below:

To Reproduce

code I wrote:

(pretty much copied from colab notebook)

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.DATASETS.TRAIN = ("dental/train",)
cfg.DATASETS.TEST = ("dental/eval")   # no metrics implemented for this dataset
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
# initialize from model zoo
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (balloon)

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)

Error log:

Failed to load OpenCL runtime

Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

Metadata(evaluator_type='coco', image_root='/root/dentalpoc/data/raw', json_file='/root/dentalpoc/data/coco_format/@ 2019-11-06 04.37.49 UTC/dental_train.json', name='dental/train', thing_classes=['decay', 'debris', 'restoration', 'filling', 'other issue', 'staining', 'gingivitis', 'plaque', 'gum', 'wear', 'brokentooth', 'gumrecession'], thing_dataset_id_to_contiguous_id={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11})
Config '/detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.

Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

Traceback (most recent call last):
  File "/root/DT/src/pipelines/train/DT.py", line 78, in <module>
  File "/detectron2_repo/detectron2/engine/defaults.py", line 350, in train
    super().train(self.start_iter, self.max_iter)
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 132, in train
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 212, in run_step
    loss_dict = self.model(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/detectron2_repo/detectron2/modeling/meta_arch/rcnn.py", line 88, in forward
    _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/detectron2_repo/detectron2/modeling/roi_heads/roi_heads.py", line 561, in forward
    losses = self._forward_box(features_list, proposals)
  File "/detectron2_repo/detectron2/modeling/roi_heads/roi_heads.py", line 615, in _forward_box
    box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/detectron2_repo/detectron2/modeling/poolers.py", line 208, in forward
    output[inds] = pooler(x_level, pooler_fmt_boxes_level)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/detectron2_repo/detectron2/layers/roi_align.py", line 95, in forward
    input, rois, self.output_size, self.spatial_scale, self.sampling_ratio, self.aligned
  File "/detectron2_repo/detectron2/layers/roi_align.py", line 20, in forward
    input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
RuntimeError: CUDA error: no kernel image is available for execution on the device **(ROIAlign_forward_cuda at /detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)**
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3b3fce5813 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xa24 (0x7f3b3e44e556 in /detectron2_repo/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xb6 (0x7f3b3e3d21b6 in /detectron2_repo/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x42cfb (0x7f3b3e3e3cfb in /detectron2_repo/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x3bfe0 (0x7f3b3e3dcfe0 in /detectron2_repo/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #5: /usr/bin/python3() [0x50abc5]
frame #6: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #7: /usr/bin/python3() [0x5081d5]
frame #8: /usr/bin/python3() [0x58952b]
frame #9: PyObject_Call + 0x3e (0x5a04ce in /usr/bin/python3)
frame #10: THPFunction_apply(_object*, _object*) + 0xa4f (0x7f3b8b37e4af in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #11: /usr/bin/python3() [0x50a84f]
frame #12: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #13: _PyFunction_FastCallDict + 0xf5 (0x5093e5 in /usr/bin/python3)
frame #14: /usr/bin/python3() [0x5951c1]
frame #15: PyObject_Call + 0x3e (0x5a04ce in /usr/bin/python3)
frame #16: _PyEval_EvalFrameDefault + 0x17f5 (0x50d8f5 in /usr/bin/python3)
frame #17: /usr/bin/python3() [0x5081d5]
frame #18: _PyFunction_FastCallDict + 0x2e2 (0x5095d2 in /usr/bin/python3)
frame #19: /usr/bin/python3() [0x5951c1]
frame #20: /usr/bin/python3() [0x54ac01]
frame #21: _PyObject_FastCallKeywords + 0x19c (0x5aa69c in /usr/bin/python3)
frame #22: /usr/bin/python3() [0x50ab53]
frame #23: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #24: _PyFunction_FastCallDict + 0xf5 (0x5093e5 in /usr/bin/python3)
frame #25: /usr/bin/python3() [0x5951c1]
frame #26: PyObject_Call + 0x3e (0x5a04ce in /usr/bin/python3)
frame #27: _PyEval_EvalFrameDefault + 0x17f5 (0x50d8f5 in /usr/bin/python3)
frame #28: /usr/bin/python3() [0x5081d5]
frame #29: _PyFunction_FastCallDict + 0x2e2 (0x5095d2 in /usr/bin/python3)
frame #30: /usr/bin/python3() [0x5951c1]
frame #31: /usr/bin/python3() [0x54ac01]
frame #32: _PyObject_FastCallKeywords + 0x19c (0x5aa69c in /usr/bin/python3)
frame #33: /usr/bin/python3() [0x50ab53]
frame #34: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #35: /usr/bin/python3() [0x509ce8]
frame #36: /usr/bin/python3() [0x50aa1d]
frame #37: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #38: /usr/bin/python3() [0x5081d5]
frame #39: _PyFunction_FastCallDict + 0x2e2 (0x5095d2 in /usr/bin/python3)
frame #40: /usr/bin/python3() [0x5951c1]
frame #41: PyObject_Call + 0x3e (0x5a04ce in /usr/bin/python3)
frame #42: _PyEval_EvalFrameDefault + 0x17f5 (0x50d8f5 in /usr/bin/python3)
frame #43: /usr/bin/python3() [0x5081d5]
frame #44: _PyFunction_FastCallDict + 0x2e2 (0x5095d2 in /usr/bin/python3)
frame #45: /usr/bin/python3() [0x5951c1]
frame #46: /usr/bin/python3() [0x54ac01]
frame #47: _PyObject_FastCallKeywords + 0x19c (0x5aa69c in /usr/bin/python3)
frame #48: /usr/bin/python3() [0x50ab53]
frame #49: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #50: /usr/bin/python3() [0x5081d5]
frame #51: _PyFunction_FastCallDict + 0x2e2 (0x5095d2 in /usr/bin/python3)
frame #52: /usr/bin/python3() [0x5951c1]
frame #53: PyObject_Call + 0x3e (0x5a04ce in /usr/bin/python3)
frame #54: _PyEval_EvalFrameDefault + 0x17f5 (0x50d8f5 in /usr/bin/python3)
frame #55: /usr/bin/python3() [0x5081d5]
frame #56: _PyFunction_FastCallDict + 0x2e2 (0x5095d2 in /usr/bin/python3)
frame #57: /usr/bin/python3() [0x5951c1]
frame #58: /usr/bin/python3() [0x54ac01]
frame #59: _PyObject_FastCallKeywords + 0x19c (0x5aa69c in /usr/bin/python3)
frame #60: /usr/bin/python3() [0x50ab53]
frame #61: _PyEval_EvalFrameDefault + 0x449 (0x50c549 in /usr/bin/python3)
frame #62: /usr/bin/python3() [0x509ce8]
frame #63: /usr/bin/python3() [0x50aa1d]

Process finished with exit code 1


A docker container on AWS DEEP learning AMI image

My dockerfile is modified based on the dockerfile provided in the repo

FROM nvidia/cuda:10.1-cudnn7-devel
# To use this Dockerfile:
# 1. `nvidia-docker build -t detectron2:v0 .`
# 2. `nvidia-docker run -it --name detectron2 detectron2:v0`

################### env and args #################

ENV DEBIAN_FRONTEND noninteractive
ARG user
ARG password

################# following are from detectron official repo ############

RUN apt-get update && apt-get install -y \
    libpng-dev libjpeg-dev python3-opencv ca-certificates \
    python3-dev build-essential pkg-config git curl wget automake libtool && \
  rm -rf /var/lib/apt/lists/*

RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
    python3 get-pip.py && \
    rm get-pip.py

# install dependencies
# See https://pytorch.org/ for other options if you use a different version of CUDA

# old version pytorch
# RUN pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install torch torchvision cython \
RUN pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

# install detectron2
RUN git clone https://github.com/facebookresearch/detectron2 /detectron2_repo
ENV TORCH_CUDA_ARCH_LIST="Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
RUN pip install -e /detectron2_repo

# install openssh server
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo "$user:$password" | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config

RUN echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
#RUN echo "prohibit-password/PermitRootLogin yes" >> /etc/ssh/sshd_config
#RUN echo "PubkeyAuthentication yes" >> /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

RUN apt-get update && apt-get install -y tmux

#install extra requirements
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

# ready to go!
WORKDIR /detectron2_repo

CMD ["/usr/sbin/sshd", "-D"]
ppwwyyxx commented 5 years ago

I think the Dockerfile we provided is working well and you can verify that to see whether it's some of your own modifications that cause the error. Please also include environment details following the issue template if you want to follow up

ruodingt commented 5 years ago

Thanks for the quick reply @ppwwyyxx , here is what I got:

root@ea5f6ef0efb9:/detectron2_repo# python3 -m detectron2.utils.collect_env
Failed to load OpenCL runtime
------------------------  --------------------------------------------------
sys.platform              linux
Python                    3.6.8 (default, Oct  7 2019, 12:59:55) [GCC 8.3.0]
Numpy                     1.13.3
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.1
PyTorch                   1.3.0
PyTorch Debug Build       False
torchvision               0.4.1
CUDA available            True
GPU 0                     Tesla K80
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.1, V10.1.243
Pillow                    6.2.1
cv2                       3.2.0
------------------------  --------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 
ppwwyyxx commented 5 years ago

Could you check whether the docker file we provided works?

vkhalidov commented 5 years ago

@ruodingt your Dockerfile specifies: ENV TORCH_CUDA_ARCH_LIST="Maxwell;Maxwell+Tegra;Pascal;Volta;Turing" However, your graphics card is: GPU 0 Tesla K80 which is Kepler. Did you try to specify smth like: ENV TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"?

ruodingt commented 5 years ago

@vkhalidov @ppwwyyxx Thank you. It works after I change the the ENV.