Open enn-nafnlaus opened 1 year ago
Hi @enn-nafnlaus
Would it be possible for you to update to CUDA 11? CUDA 10 is pretty old now, also you have a relatively recent version of GCC which might not be supported with CUDA 10. Can you share the output of nvcc --version
?
Will try to see if I can upgrade CUDA this evening - I have a training run going on right now. I will however note that this system was installed fresh just half a year ago, the latest Fedora version offered at the time (36), and this is the CUDA package autofetched from RPMFusion for that version, so if you're going to require more than that, there really should be a check. :)
nvcc not found. Will see if I can find a package to install it.
Btw if you use conda, you can have a separate environment with a different CUDA version, a different pytorch setup etc... And we also provide a package with pre-built binaries with conda
nvcc not found
That's interesting .. because nvcc should be used to build xformers. I'm surprised you didn't have an error earlier. Can you provide the complete log of the following:
FORCE_CUDA=1 pip install -v git+https://github.com/facebookresearch/xformers
Failed due to a lack of nvcc:
FORCE_CUDA=1 pip install -v git+https://github.com/facebookresearch/xformers
Using pip 21.3.1 from /usr/lib/python3.10/site-packages/pip (python 3.10)
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/facebookresearch/xformers
Cloning https://github.com/facebookresearch/xformers to /tmp/pip-req-build-2d0ol4l5
Running command git version
git version 2.37.3
Running command git clone --filter=blob:none -q https://github.com/facebookresearch/xformers /tmp/pip-req-build-2d0ol4l5
Running command git rev-parse HEAD
e23b369c094685bd42e11928649cc03b93b768d5
Resolved https://github.com/facebookresearch/xformers to commit e23b369c094685bd42e11928649cc03b93b768d5
Running command git submodule update --init --recursive -q
Running command python setup.py egg_info
Traceback (most recent call last):
File "
Also, re: prebuilt packages, what I really need is branch xformers@51dd119#egg=xformers for the memory-efficient attention, but I figured it'd be better to report the bug against the mainline package (since I get the same error either way)
This has been merged and memory-efficient attention is already available on the main branch - might be easier.
Regarding the error:
Failed due to a lack of nvcc:
This means you need to install CUDA toolkit
Upgraded to CUDA 11 while my training run was still going and successfully installed xformers - thanks :). Though can't try anything out until the training run completes or blows up so I can swap out drivers to the upgraded one ;)
Thanks again!
I might be wrong, but I don't think you need to change your driver, if they are recent enough
Automate seems to disagree, AFAIK :)
_[2022-10-28 17:00:35,422] [WARNING] [runner.py:179:fetchhostfile] Unable to find hostfile, will proceed with training with local resources only. /home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/cuda/init.py:497: UserWarning: Can't initialize NVML ... Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
🐛 Bug
I get a ton of errors like:
Command
pip install git+https://github.com/facebookresearch/xformers
To Reproduce
Steps to reproduce the behavior:
pip install git+https://github.com/facebookresearch/xformers
Expected behavior
No errors. Package installs.
Environment
PyTorch version: 1.12.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Fedora Linux 36 (Workstation Edition) (x86_64) GCC version: (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) Clang version: 14.0.5 (Fedora 14.0.5-1.fc36) CMake version: version 3.24.1 Libc version: glibc-2.35
Python version: 3.10.7 (main, Sep 7 2022, 00:00:00) [GCC 12.2.1 20220819 (Red Hat 12.2.1-1)] (64-bit runtime) Python platform: Linux-5.3.11-100.fc29.x86_64-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Nvidia driver version: 515.65.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.0 [pip3] rotary-embedding-torch==0.1.5 [pip3] torch==1.12.1 [pip3] torchaudio==0.12.1 [pip3] torchvision==0.13.1
Additional context