Closed Prof-Cheese closed 10 months ago
@Prof-Cheese That sounds impossible. Can you confirm that torch
is available in you env? What's the output of python3 -m torch.utils.collect_env
and pip3 list
?
Thank you for your response. Here are the requested details:
Output of python3 -m torch.utils.collect_env:
Output of pip list:
I confirm that torch is installed in my environment. I hope this information helps in diagnosing the issue with installing stable-fast.
@Prof-Cheese Are you sure that you invoke pip3
, not pip
?
For thoroughness, I have taken screenshots of both pip3 list and pip list outputs. Additionally, I want to mention that when I rebuilt the venv, I switched Python from version 3.11 to 3.10, but unfortunately, the issue persisted with both versions.
@Prof-Cheese Take a look at this
https://stackoverflow.com/questions/32004958/python-module-not-found-in-virtualenv
Initially, I encountered an issue where torch
was not recognized in my virtual environment. Following your advice and this Qiita article, I resolved the torch
detection issue. However, now I'm facing a new problem.
/work/image/venv/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.1'
nvidia-smi
confirms that the GPU and driver are recognized correctly.torch.cuda.is_available()
in Python returns a similar error:import torch
print(torch.cuda.is_available())
/work/image/venv/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
I am seeking advice or suggestions to resolve this CUDA error during the installation of stable-fast
. Any help would be greatly appreciated.
Do you have cuda toolkit installed on your system?
Yes, I have the CUDA toolkit installed. Interestingly, the earlier error I mentioned stopped occurring after I updated the GPU drivers. However, now I'm encountering a new error related to GCC version incompatibility. Given these issues, I'm starting to feel that Arch Linux might not be the best fit for a server environment, so I'm considering trying a different Linux distribution. Thanks for your help and advice!
Yes, I have the CUDA toolkit installed. Interestingly, the earlier error I mentioned stopped occurring after I updated the GPU drivers. However, now I'm encountering a new error related to GCC version incompatibility. Given these issues, I'm starting to feel that Arch Linux might not be the best fit for a server environment, so I'm considering trying a different Linux distribution. Thanks for your help and advice!
If you are using Arch , you may need to run export NVCC_PREPEND_FLAGS='-ccbin /usr/bin/g++-12'
before building stable fast.
Environment Information:
Problem Description: When trying to install the
stable-fast
package, I encounter aModuleNotFoundError: No module named 'torch'
. I have confirmed thattorch
is installed in the virtual environment usingpip list
. Also, no package conflicts were detected withpip check
.Attempted Solutions:
torch
in the virtual environment.Additional Information: The same error occurs when installing directly from the GitHub repository and from PyPI.
I would appreciate any suggestions or solutions to this problem.