CUDA No Longer Works - Githubissues

altoiddealer commented 3 months ago

First, confirm

[X] I have read the instruction carefully
[X] I have searched the existing issues
[X] I have updated the extension to the latest version

What happened?

Hello, I have fresh installations for the following webuis:

Note that ReForge is now widely regarded as the successor to Forge, with an active developer who has brought (almost) all of the A1111 upstream changes into Forge's memory management / performance. They optionally have a dev_upstream branch with additional upstream changes from ComfyUI.

With that all said! On a fresh installation of all of the above, with only ReActor enabled:

A1111 - NO ERROR
Forge / ReForge - ERROR (See below)

Note that I did not have any issues for a long time with Forge, but I also did not use ReActor for some time.

I believe one of your recent updates is the issue.

I've searched other posts, found a number of solutions such as:

activating the venv, reinstalling onnyxruntime
navigating into the Lib directory and deleting the onnyx packages, and allow ReActor to reinstall them
Reinstall CUDA / CUDANN / ETC.

Doesn't matter - STILL ALWAYS:

same error in Forge / Re-Forge.
A1111, no problem.

Steps to reproduce the problem

Use ReActor with sd-webui-forge or sd-webui-reforge
ReActor calls onnxruntime
onnxruntime can't find CUDA

Sysinfo

Windows 11 nVidia RTX 4070ti

A1111 SYSINFO (NO ERROR) sysinfo-2024-07-25-17-01.json

FORGE SYSINFO (ERROR) sysinfo-2024-07-25-16-59.json

REFORGE SYSINFO (ERROR) sysinfo-2024-07-25-17-40.json

Relevant console log

venv "D:\0_AI\stable-diffusion-webui-forge\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f0.0.17v1.8.0rc-previous-1-ga9e0c387
Commit hash: a9e0c387008734e97b8ad7fa091d170cb7bd4fc5
CUDA 12.1
Installing forge_legacy_preprocessor requirement: changing opencv-python version from 4.10.0.84 to 4.8.0
Installing sd-forge-controlnet requirement: changing opencv-python version from 4.10.0.84 to 4.8.0
Launching Web UI with arguments:
Total VRAM 12282 MB, total RAM 32536 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Ti : native
Hint: your device supports --pin-shared-memory for potential speed improvements.
Hint: your device supports --cuda-malloc for potential speed improvements.
Hint: your device supports --cuda-stream for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  False
Using pytorch cross attention
ControlNet preprocessor location: D:\0_AI\stable-diffusion-webui-forge\models\ControlNetPreprocessor
12:42:43 - ReActor - STATUS - Running v0.7.1-a1 on Device: CUDA
Loading weights [b4348930c8] from D:\0_AI\stable-diffusion-webui-forge\models\Stable-diffusion\sdxl\analogMadnessSDXL_sdxlV11.safetensors
2024-07-25 12:42:44,000 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  http://127.0.0.1:7860
model_type EPS
UNet ADM Dimension 2816

To create a public link, set `share=True` in `launch()`.
Startup time: 11.3s (prepare environment: 5.7s, import torch: 2.1s, import gradio: 0.5s, setup paths: 0.6s, other imports: 0.4s, load scripts: 1.2s, create ui: 0.5s, gradio launch: 0.3s).
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5948.47021484375
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  4764.913133621216
Moving model(s) has taken 0.61 seconds
12:44:07 - ReActor - STATUS - Working: source face index [0], target face index [0]
12:44:07 - ReActor - STATUS - Using Loaded Source Face Model: caro.safetensors
12:44:07 - ReActor - STATUS - Analyzing Target Image...
2024-07-25 12:44:07.7227224 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

*************** EP Error ***************
EP Error D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:891 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasnt able to be loaded. Please install the correct version of CUDA andcuDNN as mentioned in the GPU requirements page  (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements),  make sure they're in the PATH, and that your GPU is supported.
 when using ['CUDAExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
****************************************
2024-07-25 12:44:07.7913766 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

*** Error running postprocess_image: D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\reactor_faceswap.py
    Traceback (most recent call last):
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
        self._create_inference_session(providers, provider_options, disabled_optimizers)
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
        sess.initialize_session(providers, provider_options, disabled_optimizers)
    RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:891 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasnt able to be loaded. Please install the correct version of CUDA andcuDNN as mentioned in the GPU requirements page  (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements),  make sure they're in the PATH, and that your GPU is supported.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "D:\0_AI\stable-diffusion-webui-forge\modules\scripts.py", line 883, in postprocess_image
        script.postprocess_image(p, pp, *script_args)
      File "D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\reactor_faceswap.py", line 465, in postprocess_image
        result, output, swapped = swap_face(
      File "D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\reactor_swapper.py", line 594, in swap_face
        target_faces = analyze_faces(target_img, det_thresh=detection_options.det_thresh, det_maxnum=detection_options.det_maxnum)
      File "D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\reactor_swapper.py", line 302, in analyze_faces
        face_analyser = copy.deepcopy(getAnalysisModel())
      File "D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\reactor_swapper.py", line 145, in getAnalysisModel
        ANALYSIS_MODEL = insightface.app.FaceAnalysis(
      File "D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\console_log_patch.py", line 48, in patched_faceanalysis_init
        model = model_zoo.get_model(onnx_file, **kwargs)
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\insightface\model_zoo\model_zoo.py", line 96, in get_model
        model = router.get_model(providers=providers, provider_options=provider_options)
      File "D:\0_AI\stable-diffusion-webui-forge\extensions\sd-webui-reactor\scripts\console_log_patch.py", line 21, in patched_get_model
        session = PickableInferenceSession(self.onnx_file, **kwargs)
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\insightface\model_zoo\model_zoo.py", line 25, in __init__
        super().__init__(model_path, **kwargs)
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 432, in __init__
        raise fallback_error from e
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 427, in __init__
        self._create_inference_session(self._fallback_providers, None)
      File "D:\0_AI\stable-diffusion-webui-forge\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
        sess.initialize_session(providers, provider_options, disabled_optimizers)
    RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:891 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasnt able to be loaded. Please install the correct version of CUDA andcuDNN as mentioned in the GPU requirements page  (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements),  make sure they're in the PATH, and that your GPU is supported.



### Additional information

_No response_

altoiddealer commented 3 months ago

@Pan

altoiddealer commented 3 months ago

Here is some new information:

ReActor no longer working in A1111 for me now that I git pulled to new version.
The InstantID model for ControlNet also yields the same error.

HOWEVER - If I do the following, InstantID model suddenly works again:

disable ReActor
delete the onnx related Libs
pip install onnx

As soon as I re-enable ReActor (which subsequently installs the other onnx packages) InstantID no longer works, and neither does ReActor (CUDA).

Now, I'm wishing to just figure out how to prevent ReActor from trying to install the wrong stuff over and over, breaking all my WebUI installations - I'd rather just use CPU for ReActor then nothing working at all

dongxiat commented 3 months ago

pip install onnx

u right... try install Reactor will break InstantID too but reinstall onnx still dont work

r have solution? im used Webui-forge

serstesVen commented 3 months ago

So the temp solution I found to this is thanks to a user response here.

Delete the onnxruntime_gpu 1.8.x version in your /venv/Lib/site_packages folder, and then (after activating your venv) run pip install onnxruntime-gpu==1.17.0 --index-url=https://pkgs.dev.azure.com/onnxruntime/onnxruntime/_packaging/onnxruntime-cuda-12/pypi/simple command.

The issue is though, even if I remove all onnx references in both the ReForge requirements files, and the requirements for roop (with no other extensions in the folder), upon restarting the webui, it updates onnxruntimegpu 1.8 again, removing the fixed version. I'm assuming onnx is a dependency for something else, and is triggering the update for onnxgpu back to the non CUDA 12 version of 18. Only way to get it to run without updating is to just pass the --skip-install arg in webui-user.bat. But obviously that's not a great long term solution as it'll lock you out of future updates. I dug further, and found discussions here and here talking about CUDA 12 versions of 1.18, but for some reason, those still didn't work for me. I tried manually installing the 1.18 CUDA12 build from pip install onnxruntime-gpu==1.18.1 --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ --force-reinstall, and sure enough ReForge would start without updating it, but it'd still throw the same CUDA not found error, which makes me think that there's some other issue, or maybe I'm still missing something, someone else can feel free to experiment.

I'm sure someone with more knowledge on the update mechanisms can figure it out. But this at least fixes it for me for now:

TL;DR

Delete onnxruntime_gpu-1.18.1.dist-info
Run 'pip install onnxruntime-gpu==1.17.0 --index-url=https://pkgs.dev.azure.com/onnxruntime/onnxruntime/_packaging/onnxruntime-cuda-12/pypi/simple' in venv.
Add --skip-install in webui-user.bat args.
Roop will run on latest ReForge and extension.

Gourieff / sd-webui-reactor

CUDA No Longer Works #486

First, confirm

What happened?

With that all said! On a fresh installation of all of the above, with only ReActor enabled:

Note that I did not have any issues for a long time with Forge, but I also did not use ReActor for some time.

I believe one of your recent updates is the issue.

Steps to reproduce the problem

Sysinfo

Relevant console log