[Bug/Feature Request]: Need a way to override cuda is_available() check, force CPU. Getting traceback, older nvidia cuda card

NucleaPeon commented 11 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I'm in a unique situation where I have an additional GPU that I was planning on using with stable diffusion, an Nvidia K4000 Quadro, but the card does not have enough compute compatibility with the latest cuda-enabled torch version to run. It has the driver installed. The function used to determine cuda support is is_available(), which does NOT check minimum compute version. I have torch==2.1.0 installed with python 3.10 and gcc 10.x series.

I went on the issue tracker to find out how to force cpu and people suggested using --skip-torch-cuda-check, but I get the following traceback when I run with that:

./webui.sh --skip-torch-cuda-test

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on dev user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
python venv already activate or run without venv: /media/VMs/stable-diffusion/.direnv/python-3.10
################################################################

################################################################
Launching launch.py...
################################################################
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.13 (main, Sep 11 2023, 00:10:35) [GCC 12.3.1 20230526]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: --skip-torch-cuda-test --skip-torch-cuda-test
No module 'xformers'. Proceeding without it.
The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.: str
Traceback (most recent call last):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/errors.py", line 84, in run
    code()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/devices.py", line 63, in enable_tf32
    if any(torch.cuda.get_device_capability(devid) == (7, 5) for devid in range(0, torch.cuda.device_count())):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/devices.py", line 63, in <genexpr>
    if any(torch.cuda.get_device_capability(devid) == (7, 5) for devid in range(0, torch.cuda.device_count())):
  File "/media/VMs/stable-diffusion/.direnv/python-3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 435, in get_device_capability
    prop = get_device_properties(device)
  File "/media/VMs/stable-diffusion/.direnv/python-3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 449, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/media/VMs/stable-diffusion/.direnv/python-3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/launch.py", line 44, in main
    start()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/launch_utils.py", line 432, in start
    import webui
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/webui.py", line 13, in <module>
    initialize.imports()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/initialize.py", line 34, in imports
    shared_init.initialize()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/shared_init.py", line 17, in initialize
    from modules import options, shared_options
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/shared_options.py", line 3, in <module>
    from modules import localization, ui_components, shared_items, shared, interrogate, shared_gradio_themes
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/interrogate.py", line 13, in <module>
    from modules import devices, paths, shared, lowvram, modelloader, errors
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/devices.py", line 70, in <module>
    errors.run(enable_tf32, "Enabling TF32")
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/errors.py", line 86, in run
    display(task, e)
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/errors.py", line 54, in display
    te = traceback.TracebackException.from_exception(e)
  File "/usr/lib/python3.10/traceback.py", line 572, in from_exception
    return cls(type(exc), exc, exc.__traceback__, *args, **kwargs)
AttributeError: 'str' object has no attribute '__traceback__'

If I remove the flag, it does this:

./webui.sh 

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on dev user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
python venv already activate or run without venv: /media/VMs/stable-diffusion/.direnv/python-3.10
################################################################

################################################################
Launching launch.py...
################################################################
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.13 (main, Sep 11 2023, 00:10:35) [GCC 12.3.1 20230526]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Traceback (most recent call last):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/launch_utils.py", line 356, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

As you can see, the default behavior does not allow me to use the webui. In the warning, my card is cuda-enabled, but cannot be used for cuda and the process fails. Here you can see the issue on pytorch and my comments on it regarding this: https://github.com/pytorch/pytorch/issues/92250

Basically is_available does not check whether the cuda card meets the minimum torch expected capability version, so it won't work unless I physically remove my card from the machine. I'd like pytorch to fix this issue, but in the interim, perhaps you could implement an additional check to see if minimum version is met or document plainly how to force cpu usage even when cuda cards are found. I don't think --skip-torch-cuda-check works to force the cpu. I tried adding use-cpu all with the skip cuda check option and it still failed with same error.

Proposed workflow

I have two thoughts about how to overcome this issue.

You could implement something similar to my is_capable() function I wrote on the bug report and replace instances of is_available() with it
add something like --force-cpu as an option to ignore any gpu related code.

This is the code I posted for reference:

#!/usr/bin/env python
import torch

def is_cuda_capable(device_index=0):
    count = torch.cuda.device_count()
    assert count > 0, "No available cuda devices"
    assert device_index <= count, f"Device index out of range, max devices: {count}"
    major, minor = torch.cuda.get_device_capability(device_index)
    name = torch.cuda.get_device_name(device_index)
    min_arch = min(
        (int(arch.split("_")[1]) for arch in torch.cuda.get_arch_list()),
        default=35,
    )
    major_min, minor_min = (min_arch // 10, min_arch % 10,)
    return (major >= major_min and minor >= minor_min, name,)

if __name__ == "__main__":
    cuda, devicename = is_cuda_capable()
    print(f"Cuda is capable on this {devicename} device: {cuda}")

Additional information

No response

NucleaPeon commented 11 months ago

I'm using latest master git branch code. Did a git pull before filing this feature request.

missionfloyd commented 11 months ago

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings#running-on-cpu

NucleaPeon commented 11 months ago

No go.

dev@T7615 /media/VMs/stable-diffusion/stable-diffusion-webui $ ./webui.sh --use-cpu all --precision full --no-half --skip-torch-cuda-test

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on dev user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
python venv already activate or run without venv: /media/VMs/stable-diffusion/.direnv/python-3.10
################################################################

################################################################
Launching launch.py...
################################################################
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.13 (main, Sep 11 2023, 00:10:35) [GCC 12.3.1 20230526]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: --use-cpu all --precision full --no-half --skip-torch-cuda-test
No module 'xformers'. Proceeding without it.
The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.: str
Traceback (most recent call last):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/errors.py", line 84, in run
    code()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/devices.py", line 63, in enable_tf32
    if any(torch.cuda.get_device_capability(devid) == (7, 5) for devid in range(0, torch.cuda.device_count())):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/devices.py", line 63, in <genexpr>
    if any(torch.cuda.get_device_capability(devid) == (7, 5) for devid in range(0, torch.cuda.device_count())):
  File "/media/VMs/stable-diffusion/.direnv/python-3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 435, in get_device_capability
    prop = get_device_properties(device)
  File "/media/VMs/stable-diffusion/.direnv/python-3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 449, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/media/VMs/stable-diffusion/.direnv/python-3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/launch.py", line 44, in main
    start()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/launch_utils.py", line 432, in start
    import webui
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/webui.py", line 13, in <module>
    initialize.imports()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/initialize.py", line 34, in imports
    shared_init.initialize()
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/shared_init.py", line 17, in initialize
    from modules import options, shared_options
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/shared_options.py", line 3, in <module>
    from modules import localization, ui_components, shared_items, shared, interrogate, shared_gradio_themes
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/interrogate.py", line 13, in <module>
    from modules import devices, paths, shared, lowvram, modelloader, errors
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/devices.py", line 70, in <module>
    errors.run(enable_tf32, "Enabling TF32")
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/errors.py", line 86, in run
    display(task, e)
  File "/media/VMs/stable-diffusion/stable-diffusion-webui/modules/errors.py", line 54, in display
    te = traceback.TracebackException.from_exception(e)
  File "/usr/lib/python3.10/traceback.py", line 572, in from_exception
    return cls(type(exc), exc, exc.__traceback__, *args, **kwargs)
AttributeError: 'str' object has no attribute '__traceback__'
dev@T7615 /media/VMs/stable-diffusion/stable-diffusion-webui $

missionfloyd commented 11 months ago

Delete the venv folder so it installs the correct torch version.

NucleaPeon commented 11 months ago

I removed my virtual environment, re-created it and reinstalled stable-diffusion-webui. Same problem. The issue is that the gpu in my system which I don't even plan to use, generates a RuntimeError in torch simply because torch.cuda.is_available() is called. I commented out those lines to force stable diffusion to use cpu (see modules/devices.py.

def torch_gc():
    + return
    if torch.cuda.is_available():

...

def enable_tf32():
    + torch.backends.cuda.matmul.allow_tf32 = True
    + torch.backends.cudnn.allow_tf32 = True
    + return
    if torch.cuda.is_available():

Now when I run stable diffusion, it pops up with the web page and seems to be working, and shows this in the console:

To create a public link, set `share=True` in `launch()`.
Startup time: 10.2s (prepare environment: 0.1s, import torch: 4.5s, import gradio: 1.0s, setup paths: 1.0s, other imports: 1.0s, setup codeformer: 0.2s, load scripts: 0.9s, create ui: 1.1s, gradio launch: 0.4s).
Creating model from config: /media/VMs/stable-diffusion/stable-diffusion-webui/configs/v1-inference.yaml
Applying attention optimization: InvokeAI... done.
Model loaded in 8.3s (load weights from disk: 1.8s, create model: 1.2s, apply weights to model: 4.0s, calculate empty prompt: 1.2s).
[W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.

I'm using direnv to manage my venv instead of virtualenv, but it's the same functionality.

Later on, I found the env variable export CUDA_VISIBLE_DEVICES="" while looking for a fix and put it in my .envrc file and reverted the code modifications:

layout python python3.10
export PYTORCH_TRACING_MODE=TORCHFX
export COMMANDLINE_ARGS="--skip-torch-cuda-test --precision full --no-half"
export CUDA_VISIBLE_DEVICES=""

I get the same console message as above with the working webpage being displayed.

I haven't explored further to see if Could not initialize NNPACK! Reason: Unsupported hardware. is an issue yet, but my initial issue seems to be solved now.

tl;dr: to work around torch always bringing in an invalid cuda card, and force cpu settings, I need to set CUDA_VISIBLE_DEVICES to an empty string.

I think that this is enough of a gotcha to warrant documenting somewhere, do you want me to create a PR for this?

AUTOMATIC1111 / stable-diffusion-webui