Another 'no kernel image is available for execution on the device'

REPRODUCTION OF THE ISSUE

Modification

No modification

Environment

------------------------  -------------------------------------------------------------------------------
sys.platform              linux
Python                    3.7.6 (default, Jan  8 2020, 19:59:22) [GCC 7.3.0]
numpy                     1.18.1
detectron2                0.1.1 @/data/yang/detectron2/detectron2
detectron2 compiler       GCC 5.4
detectron2 CUDA compiler  10.0
detectron2 arch flags     sm_75
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.4.0 @/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch
PyTorch debug build       False
CUDA available            True
GPU 0                     GeForce RTX 2080
GPU 1                     GeForce GTX 1070
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.0, V10.0.130
Pillow                    7.0.0
torchvision               0.5.0 @/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torchvision
torchvision arch flags    sm_35, sm_50, sm_60, sm_70, sm_75
cv2                       4.2.0
------------------------  -------------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Installation

Follow Colab tutorial closely, except that I installed detectron2 from local repo with python -m pip install -e .

Running output

Exception during training:
Traceback (most recent call last):
File "/data/yang/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/data/yang/detectron2/detectron2/engine/train_loop.py", line 215, in run_step
loss_dict = self.model(data)
File "/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/data/yang/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 130, in forward
_, detector_losses = self.roi_heads(images, features, proposals, gt_instances)\
File "/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__\
result = self.forward(*input, **kwargs)\
File "/data/yang/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 582, in forward\
losses = self._forward_box(features, proposals)\
File "/data/yang/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 643, in _forward_box
box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals])
File "/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/data/yang/detectron2/detectron2/modeling/poolers.py", line 233, in forward
output[inds] = pooler(x_level, pooler_fmt_boxes_level)
File "/home/auv/anaconda3/envs/penguin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/data/yang/detectron2/detectron2/layers/roi_align.py", line 95, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio, self.aligned
File "/data/yang/detectron2/detectron2/layers/roi_align.py", line 20, in forward
input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /data/yang/detectron2/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:364)

Expected behavior

I have a model which can be trained on another machine with 2 RTX 2080 cards correctly, so I am expecting the same on this machine with GTX1070 + RTX2080 installed I have read other similar issues, but don't see what I did wrong. Can you shed some light on this? Thanks!

facebookresearch / detectron2

Another 'no kernel image is available for execution on the device' #1048