[Bug]: "Torch is not able to use GPU: add..." after xformers installation (NVIDIA QUADRO RTX 5000)

brogaIski commented 6 months ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

After using stable diffusion for some time now, but having to inpaint a large amount of data, I wanted to use xformers to speed up the image generation. After the installation I got the following error message: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check". Unfortunately I couldn't find a fix, so I thought I would simply reinstall stable diffusion to at least get the model running again without xformers. Unfortunately, even after a complete installation of SD and a new installation of my Conda environments, I still get the error message.

OS: Ubuntu 20.04.06 (LTS) Focal Fossa GPU: NVIDIA QUADRO RTX 5000

Steps to reproduce the problem

Follow instructions for xformers installation
Start SD model with ./webui.sh
Results in: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check"
Reinstall SD
Start SD model with ./webui.sh
Also results in: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check"

What should have happened?

WebUI should've started normally

What browsers do you use to access the UI ?

Other

Sysinfo

Cant generate SysInfo. ./webui.sh --dump-sysinfo results in: Traceback (most recent call last): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 48, in main() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 29, in main filename = launch_utils.dump_sysinfo() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/launch_utils.py", line 473, in dump_sysinfo from modules import sysinfo File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/sysinfo.py", line 8, in import psutil ModuleNotFoundError: No module named 'psutil'

Console logs

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on brogalski user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.31
Cannot locate TCMalloc. Do you have tcmalloc or google-perftool installed on your system? (improves CPU memory usage)
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Version: v1.8.0
Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5
Traceback (most recent call last):
  File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/launch_utils.py", line 386, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

Additional information

No response

lcmiracle-yh commented 6 months ago

It's probably because psutil isn't installed in venv or the package is broken. But if you do add "--skip-torch-cuda-test" to webui_user.bat, can you use A1111 normally? If not, try cd to your A1111 root directory and pip install psutil?

brogaIski commented 6 months ago

It's probably because psutil isn't installed in venv or the package is broken. But if you do add "--skip-torch-cuda-test" to webui_user.bat, can you use A1111 normally? If not, try cd to your A1111 root directory and pip install psutil?

So I have now tried to start it with --skip-torch-cuda-test, but without success. I then get the error message:

The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.: str Traceback (most recent call last): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/errors.py", line 98, in run code() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 106, in enable_tf32 if cuda_no_autocast(): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 28, in cuda_no_autocast device_id = get_cuda_device_id() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 40, in get_cuda_device_id ) or torch.cuda.current_device() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 769, in current_device _lazy_init() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 298, in _lazy_init torch._C._cuda_init() RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 48, in main() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 44, in main start() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/launch_utils.py", line 465, in start import webui File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/webui.py", line 13, in initialize.imports() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/initialize.py", line 36, in imports shared_init.initialize() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/shared_init.py", line 17, in initialize from modules import options, shared_options File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/shared_options.py", line 4, in from modules import localization, ui_components, shared_items, shared, interrogate, shared_gradio_themes, util, sd_emphasis File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/interrogate.py", line 13, in from modules import devices, paths, shared, lowvram, modelloader, errors, torch_utils File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 113, in errors.run(enable_tf32, "Enabling TF32") File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/errors.py", line 100, in run display(task, e) File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/errors.py", line 68, in display te = traceback.TracebackException.from_exception(e) File "/home/brogalski/miniconda3/envs/bachelorarbeit/lib/python3.10/traceback.py", line 572, in from_exception return cls(type(exc), exc, exc.traceback, *args, **kwargs)

The strange thing is that the model worked perfectly until the attempted xformers installation and I didn't change anything in the drivers. My assumption now would be that the xformers installation has installed a different torch version. According to my logic, this should only have happened in the venv and should no longer occur with a new installation. Unfortunately, as already described, this is not the case. I am also forced to run SD on the GPU because I have these huge amounts of data and otherwise it just takes too long.

Flashwalker commented 6 months ago

same RuntimeError: Torch is not able to use GPU... on 4060

Flashwalker commented 6 months ago

I run linux in hybrid graphics mode. iGPU: AMD Radeon™ 610M, dGPU: Nvidia RTX 4060
At first run with "--skip-torch-cuda-test" it started ok. But now it always fails:

``` ... ################################################################ Launching launch.py... ################################################################ glibc version is 2.35 Cannot locate TCMalloc. Do you have tcmalloc or google-perftool installed on your system? (improves CPU memory usage) fatal: No names found, cannot describe anything. Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] Version: 1.8.0-RC Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5 Traceback (most recent call last): File "/home/user/stable-diffusion-webui/launch.py", line 48, in main() File "/home/user/stable-diffusion-webui/launch.py", line 39, in main prepare_environment() File "/home/user/stable-diffusion-webui/modules/launch_utils.py", line 421, in prepare_environment if not requirements_met(requirements_file): File "/home/user/stable-diffusion-webui/modules/launch_utils.py", line 311, in requirements_met if packaging.version.parse(version_required) != packaging.version.parse(version_installed): File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/packaging/version.py", line 52, in parse return Version(version) File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/packaging/version.py", line 196, in __init__ match = self._regex.search(version) TypeError: expected string or bytes-like object ```

I was able to start it successfully at first run after recent git pull, and then i terminated it with Ctrl+C.
Now it fails to start.

Flashwalker commented 6 months ago

Now only a fresh installation brought it back to work

brogaIski commented 6 months ago

After searching around a bit, I realised that several cuda versions are installed on the computer. Apparently the xformers installation (for whatever reason) resulted in an older cuda version, namely 10.4, being used. However, I have now uninstalled all cuda installations and updated my drivers so that I can use cuda 12.4. Now everything works again.

christming commented 6 months ago

After searching around a bit, I realised that several cuda versions are installed on the computer. Apparently the xformers installation (for whatever reason) resulted in an older cuda version, namely 10.4, being used. However, I have now uninstalled all cuda installations and updated my drivers so that I can use cuda 12.4. Now everything works again Still meet the same bug when use the --xformers in the webui-user.bat Seem the version of torch conflict with 2.2.1 and 2.1.2.

SchofieldPriest commented 6 months ago

I have this exact same error on an AMD GPU! I haven't installed xformers however, and I am trying to fix this on the directml fork (the only one I know that works with AMD cards). Adding --skip-torch-cuda-test skipped past the error, but left the command line stuck on "Installing requirements". The webui was working just fine before, and I need help getting it to run again! Installing psutil did not fix the issue.

zmsoft commented 3 months ago

It's probably because psutil isn't installed in venv or the package is broken. But if you do add "--skip-torch-cuda-test" to webui_user.bat, can you use A1111 normally? If not, try cd to your A1111 root directory and pip install psutil?

So I have now tried to start it with --skip-torch-cuda-test, but without success. I then get the error message:

The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.: str Traceback (most recent call last): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/errors.py", line 98, in run code() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 106, in enable_tf32 if cuda_no_autocast(): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 28, in cuda_no_autocast device_id = get_cuda_device_id() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 40, in get_cuda_device_id ) or torch.cuda.current_device() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 769, in current_device _lazy_init() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 298, in _lazy_init torch._C._cuda_init() RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 48, in main() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/launch.py", line 44, in main start() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/launch_utils.py", line 465, in start import webui File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/webui.py", line 13, in initialize.imports() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/initialize.py", line 36, in imports shared_init.initialize() File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/shared_init.py", line 17, in initialize from modules import options, shared_options File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/shared_options.py", line 4, in from modules import localization, ui_components, shared_items, shared, interrogate, shared_gradio_themes, util, sd_emphasis File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/interrogate.py", line 13, in from modules import devices, paths, shared, lowvram, modelloader, errors, torch_utils File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/devices.py", line 113, in errors.run(enable_tf32, "Enabling TF32") File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/errors.py", line 100, in run display(task, e) File "/home/brogalski/bachelorarbeit/models/stable-diffusion-webui/modules/errors.py", line 68, in display te = traceback.TracebackException.from_exception(e) File "/home/brogalski/miniconda3/envs/bachelorarbeit/lib/python3.10/traceback.py", line 572, in from_exception return cls(type(exc), exc, exc.traceback, *args, **kwargs)

The strange thing is that the model worked perfectly until the attempted xformers installation and I didn't change anything in the drivers. My assumption now would be that the xformers installation has installed a different torch version. According to my logic, this should only have happened in the venv and should no longer occur with a new installation. Unfortunately, as already described, this is not the case. I am also forced to run SD on the GPU because I have these huge amounts of data and otherwise it just takes too long.

I have the same error as you. Have you upgraded the nvidia driver?

hendkai commented 2 months ago

Have the same error with amd. Fresh installation with my 7900 xtx with the AMD instruction worked. Then i played around and installed some extensions and models. After restart it wont start anymore.

AUTOMATIC1111 / stable-diffusion-webui