Closed antgr closed 4 years ago
You need to download CUDA 9.0 and then change your CUDA_HOME
and PATH
environmental variables to that folder. You can do this by downloading the .tar version of cuda and extracting manually if you want to have multiple version of CUDA. Otherwise, you can install pytorch 1.3 which is built against CUDA 10.1.
Thank you
In case anyone has this problem, what worked for me was doing pip uninstall torch a few times and reinstalling with conda. It seems I had older versions of pytorch that apex was looking at. After uninstalling 3 times I got that torch is no longer installed and proceeded to install through conda. It worked after this.
Does anyone has figured out how to install apex with
CUDA Version 10.0*
torch==1.5.0
It will be immensely helpful if there is a way to install apex with most recent versions of CUDA and torch. I cannot downdrade CUDA to a lower version. Thanks !
Does anyone has figured out how to install apex with
CUDA Version 10.0* torch==1.5.0
It will be immensely helpful if there is a way to install apex with most recent versions of CUDA and torch. I cannot downdrade CUDA to a lower version. Thanks !
I think PyTorch 1.5.0 is compiled with CUDA 10.2
Does anyone has figured out how to install apex with
CUDA Version 10.0* torch==1.5.0
It will be immensely helpful if there is a way to install apex with most recent versions of CUDA and torch. I cannot downdrade CUDA to a lower version. Thanks !
I think PyTorch 1.5.0 is compiled with CUDA 10.2
Yeah. That is correct. Since I could not upgrade CUDA, I downgraded pytorch.
conda install gxx_linux-64
and conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
did the trick for me.
I fixed this issue by running export CUDA_HOME=/usr/local/cuda-10.2/
So, I had cuda-10.0 installed on my system (only /usr/local/cuda-10.0
), I had installed pytorch with cuda-11.0, and that's why this compilation was throwing this error. I installed cuda-11.0 toolkit (only, didn't touch the drivers), and I had two cuda versions on my system after this (which is completely fine, you just need to point to the one you wanna use at the time of compilations and stuff). After this, I just did export CUDA_HOME=/usr/local/cuda-11.0/
and tried compiling again. It worked!
After a long time of Googling, I found each version of cuda has different compatibility for gcc. For me, I was using cuda 10.2, and downgrading gcc to 6.1 solved this problem.
Thanks guys! I was able to install Apex for my conda PyTorch installation with your help. Here is the full step by step:
python -c "import torch; print(torch.version.cuda)"
CUDA_HOME=/usr/local/cuda-{your-version-here}/
.
CUDA_HOME=/usr/local/cuda-11.3 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
In addition to https://github.com/NVIDIA/apex/issues/550#issuecomment-1059985098 remember to change the specific version you want to add in the install command sudo apt-get -y install cuda-11-3
You can use the nvidia-smi
and nvcc -V
commands to check whether the NVIDIA CUDA driver version is consistent with the cuda compiler version. If it is not consistent, this error will be reported. For example, my previous version, as shown in the figure below, will lead to the same error.
NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1
pip3 install --user -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ /usr/lib/python3.7/site-packages/pip/_internal/commands/install.py:217: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options. cmdoptions.check_install_build_global(options) Created temporary directory: /tmp/pip-ephem-wheel-cache-lamo85tb Created temporary directory: /tmp/pip-req-tracker-du8h9cpx Created requirements tracker '/tmp/pip-req-tracker-du8h9cpx' Created temporary directory: /tmp/pip-install-wkd21x48 Processing /home/polykratis/mt-dnn/apex Created temporary directory: /tmp/pip-req-build-j8rxyv0b Added file:///home/polykratis/mt-dnn/apex to build tracker '/tmp/pip-req-tracker-du8h9cpx' Running setup.py (path:/tmp/pip-req-build-j8rxyv0b/setup.py) egg_info for package from file:///home/polykratis/mt-dnn/apex Running command python setup.py egg_info torch.version = 1.1.0 running egg_info creating pip-egg-info/apex.egg-info writing pip-egg-info/apex.egg-info/PKG-INFO writing dependency_links to pip-egg-info/apex.egg-info/dependency_links.txt writing top-level names to pip-egg-info/apex.egg-info/top_level.txt writing manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt' reading manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt' writing manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt' /tmp/pip-req-build-j8rxyv0b/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!") Source in /tmp/pip-req-build-j8rxyv0b has version 0.1, which satisfies requirement apex==0.1 from file:///home/polykratis/mt-dnn/apex Removed apex==0.1 from file:///home/polykratis/mt-dnn/apex from build tracker '/tmp/pip-req-tracker-du8h9cpx' Installing collected packages: apex Created temporary directory: /tmp/pip-record-ehkfvhhb Running setup.py install for apex ... Running command /usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-j8rxyv0b/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ehkfvhhb/install-record.txt --single-version-externally-managed --compile --user --prefix= torch.version = 1.1.0 /tmp/pip-req-build-j8rxyv0b/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
error Cleaning up... Removing source in /tmp/pip-req-build-j8rxyv0b Removed build tracker '/tmp/pip-req-tracker-du8h9cpx' Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-j8rxyv0b/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ehkfvhhb/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-req-build-j8rxyv0b/ Exception information: Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/usr/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 421, in run strip_file_prefix=options.strip_file_prefix, File "/usr/lib/python3.7/site-packages/pip/_internal/req/init.py", line 57, in install_given_reqs **kwargs File "/usr/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 949, in install spinner=spinner, File "/usr/lib/python3.7/site-packages/pip/_internal/utils/misc.py", line 771, in call_subprocess % (command_desc, proc.returncode, cwd)) pip._internal.exceptions.InstallationError: Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-j8rxyv0b/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ehkfvhhb/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-req-build-j8rxyv0b/ 1 location(s) to search for versions of pip: