Open stevezkw1998 opened 1 year ago
Hi, looks like to met that you would need to use the devel
image and not the runtime
since you need to be able to compile against torch and cuda. SO I would try changing the docker image name from pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
to pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
Hi @ClementPinard Thank you for your advice
After I changed the docker image name from pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
to pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
The former issues fixed, but I has new issue:
=> ERROR [13/14] RUN cd Pytorch-Correlation-extension && python setup.py install 15.9s
------
> [13/14] RUN cd Pytorch-Correlation-extension && python setup.py install:
#0 1.665 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#0 1.689 running install
#0 1.689 /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
#0 1.689 warnings.warn(
#0 1.752 /opt/conda/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
#0 1.752 warnings.warn(
#0 1.818 running bdist_egg
#0 1.830 running egg_info
#0 1.830 creating Correlation_Module/spatial_correlation_sampler.egg-info
#0 1.835 writing Correlation_Module/spatial_correlation_sampler.egg-info/PKG-INFO
#0 1.836 writing dependency_links to Correlation_Module/spatial_correlation_sampler.egg-info/dependency_links.txt
#0 1.836 writing requirements to Correlation_Module/spatial_correlation_sampler.egg-info/requires.txt
#0 1.836 writing top-level names to Correlation_Module/spatial_correlation_sampler.egg-info/top_level.txt
#0 1.836 writing manifest file 'Correlation_Module/spatial_correlation_sampler.egg-info/SOURCES.txt'
#0 1.842 /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
#0 1.842 warnings.warn(msg.format('we could not find ninja.'))
#0 1.846 reading manifest file 'Correlation_Module/spatial_correlation_sampler.egg-info/SOURCES.txt'
#0 1.847 adding license file 'LICENSE'
#0 1.847 writing manifest file 'Correlation_Module/spatial_correlation_sampler.egg-info/SOURCES.txt'
#0 1.848 installing library code to build/bdist.linux-x86_64/egg
#0 1.848 running install_lib
#0 1.848 running build_py
#0 1.849 creating build
#0 1.849 creating build/lib.linux-x86_64-cpython-310
#0 1.849 creating build/lib.linux-x86_64-cpython-310/spatial_correlation_sampler
#0 1.849 copying Correlation_Module/spatial_correlation_sampler/spatial_correlation_sampler.py -> build/lib.linux-x86_64-cpython-310/spatial_correlation_sampler
#0 1.850 copying Correlation_Module/spatial_correlation_sampler/__init__.py -> build/lib.linux-x86_64-cpython-310/spatial_correlation_sampler
#0 1.850 running build_ext
#0 1.868 building 'spatial_correlation_sampler_backend' extension
#0 1.868 creating build/temp.linux-x86_64-cpython-310
#0 1.868 creating build/temp.linux-x86_64-cpython-310/Correlation_Module
#0 1.869 gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -DUSE_CUDA -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.10 -c Correlation_Module/correlation.cpp -o build/temp.linux-x86_64-cpython-310/Correlation_Module/correlation.o -std=c++14 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=spatial_correlation_sampler_backend -D_GLIBCXX_USE_CXX11_ABI=0
#0 15.65 Traceback (most recent call last):
#0 15.65 File "/app/Pytorch-Correlation-extension/setup.py", line 69, in <module>
#0 15.65 launch_setup()
#0 15.65 File "/app/Pytorch-Correlation-extension/setup.py", line 37, in launch_setup
#0 15.65 setup(
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup
#0 15.65 return distutils.core.setup(**attrs)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
#0 15.65 return run_commands(dist)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
#0 15.65 dist.run_commands()
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
#0 15.65 self.run_command(cmd)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
#0 15.65 super().run_command(command)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
#0 15.65 cmd_obj.run()
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/command/install.py", line 74, in run
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
#0 15.66 _build_ext.build_extension(self, ext)
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension
#0 15.66 objects = self.compiler.compile(
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/ccompiler.py", line 599, in compile
#0 15.66 self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_single_compile
#0 15.66 cflags = unix_cuda_flags(cflags)
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 548, in unix_cuda_flags
#0 15.66 cflags + _get_cuda_arch_flags(cflags))
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1773, in _get_cuda_arch_flags
#0 15.66 arch_list[-1] += '+PTX'
#0 15.66 IndexError: list index out of range
------
Dockerfile:33
--------------------
31 | # Install Pytorch Correlation
32 | RUN git clone https://github.com/ClementPinard/Pytorch-Correlation-extension.git
33 | >>> RUN cd Pytorch-Correlation-extension && python setup.py install
34 | RUN cd -
35 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd Pytorch-Correlation-extension && python setup.py install" did not complete successfully: exit code: 1
Docker build failed with error: Command 'docker build -t sam-track:1.0.0 ..' returned non-zero exit status 1.
See this related issue : https://github.com/ClementPinard/Pytorch-Correlation-extension/issues/90
GPU is not available during docker build so you need to figure out your compute capbilities beforehand and set the TORCH_CUDA_ARCH_LIST
environment variable accordingly
Hi @ClementPinard Thank you for your solution But I may need to deploy my docker image to different computer Is there any general solution to solve TORCH_CUDA_ARCH_LIST env var issue?
If you don't know what the gpu cuda capabilties of your machine will be, your best bet is to compile for as much architectures as possible, or wait for the docker to be launched to compile the library. Compiled code cannot be generic
My Dockerfile
Then raise an Error:
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
The full error logs: