'torch' module not found during 'stable-fast' installation

Prof-Cheese commented 10 months ago

Environment Information:

Python Version: 3.10.13
Virtual Environment: venv (confirmed)
torch Version: 2.1.0 (installed in the venv)
pip Version: 23.3.1
OS: Linux

Problem Description: When trying to install the stable-fast package, I encounter a ModuleNotFoundError: No module named 'torch'. I have confirmed that torch is installed in the virtual environment using pip list. Also, no package conflicts were detected with pip check.

Attempted Solutions:

Reconfirmed the installation of torch in the virtual environment.
Tested in a different virtual environment.

Additional Information: The same error occurs when installing directly from the GitHub repository and from PyPI.

I would appreciate any suggestions or solutions to this problem.

chengzeyi commented 10 months ago

@Prof-Cheese That sounds impossible. Can you confirm that torch is available in you env? What's the output of python3 -m torch.utils.collect_env and pip3 list?

Prof-Cheese commented 10 months ago

Thank you for your response. Here are the requested details:

Output of python3 -m torch.utils.collect_env:

PyTorch version: 2.1.0+cu121
CUDA used to build PyTorch: 12.1
Python version: 3.10.13
OS: Manjaro Linux (x86_64)
Is CUDA available: True
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060
Nvidia driver version: 535.129.03

Output of pip list:

torch: 2.1.0
torchaudio: 2.1.0
torchvision: 0.16.0
triton: 2.1.0
diffusers: 0.23.1
numpy: 1.26.2
[and other installed packages...]

I confirm that torch is installed in my environment. I hope this information helps in diagnosing the issue with installing stable-fast.

chengzeyi commented 10 months ago

@Prof-Cheese Are you sure that you invoke pip3, not pip?

Prof-Cheese commented 10 months ago

For thoroughness, I have taken screenshots of both pip3 list and pip list outputs. Additionally, I want to mention that when I rebuilt the venv, I switched Python from version 3.11 to 3.10, but unfortunately, the issue persisted with both versions.

chengzeyi commented 10 months ago

@Prof-Cheese Take a look at this

https://stackoverflow.com/questions/32004958/python-module-not-found-in-virtualenv

Prof-Cheese commented 10 months ago

Problem Summary

Initially, I encountered an issue where torch was not recognized in my virtual environment. Following your advice and this Qiita article, I resolved the torch detection issue. However, now I'm facing a new problem.

/work/image/venv/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.1'

Additional Observations

nvidia-smi confirms that the GPU and driver are recognized correctly.
Running torch.cuda.is_available() in Python returns a similar error:

import torch
print(torch.cuda.is_available())
/work/image/venv/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False

Request for Assistance

I am seeking advice or suggestions to resolve this CUDA error during the installation of stable-fast. Any help would be greatly appreciated.

chengzeyi commented 10 months ago

Do you have cuda toolkit installed on your system？

Prof-Cheese commented 10 months ago

Yes, I have the CUDA toolkit installed. Interestingly, the earlier error I mentioned stopped occurring after I updated the GPU drivers. However, now I'm encountering a new error related to GCC version incompatibility. Given these issues, I'm starting to feel that Arch Linux might not be the best fit for a server environment, so I'm considering trying a different Linux distribution. Thanks for your help and advice!

gameltb commented 10 months ago

Yes, I have the CUDA toolkit installed. Interestingly, the earlier error I mentioned stopped occurring after I updated the GPU drivers. However, now I'm encountering a new error related to GCC version incompatibility. Given these issues, I'm starting to feel that Arch Linux might not be the best fit for a server environment, so I'm considering trying a different Linux distribution. Thanks for your help and advice!

If you are using Arch , you may need to run export NVCC_PREPEND_FLAGS='-ccbin /usr/bin/g++-12' before building stable fast.

chengzeyi / stable-fast