Closed ruodingt closed 5 years ago
I think the Dockerfile we provided is working well and you can verify that to see whether it's some of your own modifications that cause the error. Please also include environment details following the issue template if you want to follow up
Thanks for the quick reply @ppwwyyxx , here is what I got:
root@ea5f6ef0efb9:/detectron2_repo# python3 -m detectron2.utils.collect_env
Failed to load OpenCL runtime
------------------------ --------------------------------------------------
sys.platform linux
Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0]
Numpy 1.13.3
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.1
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.0
PyTorch Debug Build False
torchvision 0.4.1
CUDA available True
GPU 0 Tesla K80
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.1, V10.1.243
Pillow 6.2.1
cv2 3.2.0
------------------------ --------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
Could you check whether the docker file we provided works?
@ruodingt your Dockerfile specifies:
ENV TORCH_CUDA_ARCH_LIST="Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
However, your graphics card is:
GPU 0 Tesla K80
which is Kepler.
Did you try to specify smth like:
ENV TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
?
@vkhalidov @ppwwyyxx Thank you. It works after I change the the ENV
.
Hi, mate
I had some issue when running a customised script on my own training dataset.
The environment is build based on Dockerfile provided in the repo. I only add a ssh server to the docker file and nothing else is changed. I think I may miss something, making It crashes at the ROI alignment.
The details are attached below:
To Reproduce
code I wrote:
(pretty much copied from colab notebook)
Error log:
Environment
A docker container on AWS DEEP learning AMI image
My dockerfile is modified based on the dockerfile provided in the repo