Open yinwenpeng opened 5 years ago
Are you installing within a container, or on bare metal? Either way, this could be due to a lingering previous install on your system.
It might be worth trying a clean uninstall
pip uninstall apex;
pip uninstall apex; # (repeat until it says Skipping apex as it is not installed,
# because if you also installed using the old `python setup.py install`,
# you may also have the old files installed at a different location)
then
cd apex_repo;
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
thanks, it was installed without a container i think. I tried the "uninstall" multiple times, but do not help. The problem still exists
@thorjohnsen Have you seen this error before?
One random possibility that occurs to me is that you are somehow compiling with a version of nvcc that is different from the cuda runtime library that the application is attempting to load when you execute it. Can you give me the results of these three commands:
nvcc --version
which nvcc
echo $LD_LIBRARY_PATH
Also, add print(torch.utils.cpp_extension.CUDA_HOME)
here
https://github.com/NVIDIA/apex/blob/master/setup.py#L38
then run the install and see what it prints, so we know where the install script itself is looking to find nvcc.
Thanks. It shows as follows:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Tue_Jun_12_23:07:04_CDT_2018 Cuda compilation tools, release 9.2, V9.2.148
which nvcc
/usr/local/cuda/bin/nvcc
echo $LD_LIBRARY_PATH :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cudnn/lib64
add print(torch.utils.cpp_extension.CUDA_HOME) here https://github.com/NVIDIA/apex/blob/master/setup.py#L38
/usr/local/cuda
So, what's wrong there? thanks
That looks reasonable...My suspicion is that the version of Pytorch installed on your system does not match the version of Cuda installed on your system. Can you also print torch.version.cuda
right next to print(torch.utils.cpp_extension.CUDA_HOME)
at line 38 of setup.py?
Other people have had similar issues with extensions: https://github.com/jwyang/faster-rcnn.pytorch/issues/190 https://github.com/open-mmlab/mmdetection/issues/66#issuecomment-434165962 This one looks like the most helpful: https://github.com/rusty1s/pytorch_scatter/issues/19 https://github.com/rusty1s/pytorch_scatter/issues/19#issuecomment-449735614
Maybe you can fix the issue by uninstalling pytorch (run pip uninstall torch
repeatedly until it says torch is not installed), uninstalling apex (run pip uninstall apex
repeatedly until it says it's not installed), then either rebuilding Pytorch from source, or conda installing again and making sure it matches the version of Cuda you have on bare metal. Afterwards, reinstall Apex and see if it works. Sorry for the annoyance but like I said, this seems to be an issue other people have had, and it does not seem like an issue with Apex in particular.
Encountered this too, even after reinstalling cuda with matching nvcc
and torch.version.cuda
versions. Given that pip --version
on my machine was version 23.0.1
, I was using the pip command listed in the README for pip < 23.1:
# if pip >= 23.1 [...]
[...]
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
However, this would strangely result in a python-only build, without compiling the C sources. The install would still succeed and display
Successfully built apex
Installing collected packages: apex
However, attempting to import and use it in running application code would give the error:
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'
Inspecting the pip install more closely, this warning appeared near the top:
WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option / --install-option. Consider using --config-settings for more flexibility.
After trial and error, I tried the other install command from the README meant for pip >= 23.1, and that worked. Both python and C sources were compiled and importable/usable from running application code.
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
I haven't dug into why there are separate install instructions for pip <=> 23.1, but that might need another look. Let me know if I can provide other info to help.
When I tried the "Quick Start" : $ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
my program shows this error :
while if I tried "pip install -v --no-cache-dir .", the error becomes:
I am using pytorch 1.0, cuda 9.2. No idea what's wrong here. thanks