AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.41k stars 26.72k forks source link

[Bug]: CUDA error since Stable Diffusion 2.0 changes #6011

Open TheMundaneDave opened 1 year ago

TheMundaneDave commented 1 year ago

Is there an existing issue for this?

What happened?

This is a repost of issue #5097 which was closed erroneously.

Ever since the first changes made to accommodate the new v2.0 models I cannot generate an image in txt2img. I did a fresh clone on 2022-12-25 and this issue persists. I can start the web-ui and enter a prompt. After clicking generate the following occurs...

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:22<00:00,  1.11s/it]
Error completing request███████████████████████████████████████████████████████████████| 20/20 [00:18<00:00,  1.05it/s]
Arguments: ('photo of a llama', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 0, 0, 0, False, False, False, False, '', 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "C:\AI\stable-diffusion-webui\modules\call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "C:\AI\stable-diffusion-webui\modules\call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "C:\AI\stable-diffusion-webui\modules\txt2img.py", line 49, in txt2img
    processed = process_images(p)
  File "C:\AI\stable-diffusion-webui\modules\processing.py", line 469, in process_images
    res = process_images_inner(p)
  File "C:\AI\stable-diffusion-webui\modules\processing.py", line 576, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "C:\AI\stable-diffusion-webui\modules\processing.py", line 576, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "C:\AI\stable-diffusion-webui\modules\processing.py", line 404, in decode_first_stage
    x = model.decode_first_stage(x)
  File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "C:\AI\stable-diffusion-webui\modules\lowvram.py", line 52, in first_stage_model_decode_wrap
    return first_stage_model_decode(z)
  File "C:\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 631, in forward
    h = self.mid.attn_1(h)
  File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 258, in forward
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=self.attention_op)
  File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops.py", line 862, in memory_efficient_attention
    return op.forward_no_grad(
  File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops.py", line 305, in forward_no_grad
    return cls.FORWARD_OPERATOR(
  File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

This is my webui-user.bat

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--xformers --medvram
rem git pull
call webui.bat

I do have an ancient video card (GTX 970) but I am able to use the web-ui if I git pull git reset --hard 828438b4a190759807f9054932cae3a8b880ddf1 (more than a month stale now) but there are many new features and I'm running into compatibility issues with models and extensions. Is there any hope this will be addressed?

Steps to reproduce the problem

  1. Go to txt2img
  2. Type prompt
  3. Click Generate

What should have happened?

Not a Cuda error

Commit where the problem happens

c6f347b81f584b6c0d44af7a209983284dbb52d2

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

No response

Additional information, context and logs

No response

aliencaocao commented 1 year ago

Yes, you cannot run SD2.0 with a gpu so old (saw you using M40 and Kelper series - those are not even supported by latest pytorch anymore)

TheMundaneDave commented 1 year ago

Not trying to use a v2.0 model just the old 1.5 I use every day on the old commit

aliencaocao commented 1 year ago

Yes because the SD2.0 update switched to the SD2.0 version of the SD repo, which uses some unsupported operators

TheMundaneDave commented 1 year ago

So... That's it? No path forward? 不好。

aliencaocao commented 1 year ago

The problem is I cannot reproduce your issue, and I believe not many others can. it is more related to your own system than to this repo. Given your old GPU, it is more likely your system. Until you upgrade to a newer GPU (Pascal), you have to stay in the old commit.

As you can see yourself in the traceback, repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py, the error is not from this repo's code, but from https://github.com/Stability-AI/stablediffusion. If you really want a fix, you should raise a issue over there. This repo is merely calling their code. It is impossible for anyone in this repo to fix this issue.

TheMundaneDave commented 1 year ago

Been looking to upgrade for a while but you can't even get a 30 series card anymore. I'm not great at coding and I didn't notice that the error is with Stability-AI's code... Doubt I'll get much love over there but I'll give it a shot. ETA: I know you CAN get a 30 series card... I'm just not gonna pay what they're asking for yesterday's model, and nobody in their right mind should buy a 4080 and 4090s are a myth. Nobody's ever seen one,

TheMundaneDave commented 1 year ago

Just did a check... repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py is identical to the current commit and my old working directory. There are a lot of lines files referenced in that traceback. Are you sure that's the problem spot? ETA: in fact all the files in that folder are identical (except the pyc cache files)

aliencaocao commented 1 year ago

xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=self.attention_op) Xformers does not support the SD2.0 attention unit (which is different from SD1.0) on older GPUs

Not sure what you mean by 'identical', but it is perfectly normal for it to match the online version, after all, it is just cloned from the online version. This simply means their repo currently also have the same issue.

aliencaocao commented 1 year ago

You can see xformer repo: https://github.com/facebookresearch/xformers

the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"

This means the oldest supported CUDA version is 6.0, your GTX 970 is 5.2, which is unsupported. It is already unexpected that it works for SD1.0 (it is not officially supported), but with SD2.0 it is normal to not work.

You may try to build xformers locally for your older CUDA by export TORCH_CUDA_ARCH_LIST="5.2" though. It may be able to compile to older CUDA kernels. If you are using the wheels from this repo (built by @C43H66N12O12S2 ), it will not work.

TheMundaneDave commented 1 year ago

Not sure what you mean by 'identical', but it is perfectly normal for it to match the online version, after all, it is just cloned from the online version. This simply means their repo currently also have the same issue.

Identical meaning WinMerge (a comparison program) found the files to be the same. At any rate I can't really complain to Stability AI like you suggested when it seems that xformers is my problem (if I'm following you correctly).

TheMundaneDave commented 1 year ago

You may try to build xformers locally for your older CUDA by export TORCH_CUDA_ARCH_LIST="5.2" though. It may be able to compile to older CUDA kernels. If you are using the wheels from this repo (built by @C43H66N12O12S2 ), it will not work.

This sounds like big-boy stuff... I have no idea where to begin but I'll look into that then

aliencaocao commented 1 year ago

You can try disabling xformers. What do you mean by the files are same? If you are comparing the content of repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py between the older commit of this repo and the current latest, of course they will be same. Any file under repositories\stable-diffusion-stability-ai will NOT change if you do git checkout for this repo. They are independent. This repo uses a symlink to link to the SD repo, which will not get updated unless you explicitly run git checkout under that SD repo.

TheMundaneDave commented 1 year ago

What do you mean by the files are same?

I used winmerge to compare the repositories\stable-diffusion-stability-ai folders. all contents except cache files and images are identical

I have another folder with my working webui to compare against

TheMundaneDave commented 1 year ago

would using the xformers built there work?

aliencaocao commented 1 year ago

would using the xformers built there work?

Built where?

TheMundaneDave commented 1 year ago

on my working (pre2.0) commit

aliencaocao commented 1 year ago

It does not matter. Building xformers is irrelevant to which commit you have for SD webui and SD repo. All that matters is the build config, which is controlled by environment variable TORCH_CUDA_ARCH_LIST

TheMundaneDave commented 1 year ago

blink I think I'm wasting your time. I understood none of that.

HOWEVER I can generate an image if I don't call xformers in my webui-user.bat file. I don't even know if I can be sure that xformers ever did work with my old commit. No erros come up during startup or on image generation there though. Is there any way to tell if xformers are active in the web-ui?

aliencaocao commented 1 year ago

If you see no errors, it worked. As I said, the problem is just with SD 2.0 codes. And now that you verified without xformer works, then it must be with xformers not supporting your older GPU for SD2.0

TheMundaneDave commented 1 year ago

Thanks for your time. I'm could've sworn I noticed an improvement to my performance on the old commit after adding --xformers to the bat. Maybe it never changed anything and I imagined it. I'll see if I the new commit functions as well as the old one then. Sorry for wasting your time (on Xmas no less). Feel kinda silly if that was my problem for more than a month, ETA: tried building xformers... nope: error: legacy-install-failure

CRCODE22 commented 1 year ago

I have the same problem @TheMundaneDave I have a GTX 970 it used to work great with xformers until a few days ago when stable-diffusion-webui got updated. When you remove the --xformers from the webui-user.bat it will work again but it will be slower without xformers.

This version of V2.0

https://github.com/cmdr2/stable-diffusion-ui

Does work great on my GTX-970 and with xformers and it does all of the installing for you.

gtrrnr commented 1 year ago

I have the same problem @TheMundaneDave I have a GTX 970 it used to work great with xformers until a few days ago when stable-diffusion-webui got updated. When you remove the --xformers from the webui-user.bat it will work again but it will be slower without xformers.

This version of V2.0

https://github.com/cmdr2/stable-diffusion-ui

Does work great on my GTX-970 and with xformers and it does all of the installing for you.

GTX 980 user here, had same issue with cuda error. Decided to update ui today using git pull, saw that it's broken, just reverted git pull and currently using following commit:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/9b384dfb5c05129f50cc3f0262f89e8b788e5cf3

Instruction in case you did git pull and want to revert to last working commit you can do:

git reflog show

It will output you something like:

git reflog show
ce9827a (HEAD -> master, origin/master, origin/HEAD) HEAD@{0}: pull: Fast-forward
9b384df HEAD@{1}: clone: from https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

So 9b384df is the hash of commit before the pull. To reset it you need to type:

git reset --hard 9b384df

Don't forget to add xformers/medvram/split attention in webui-user after reset. Hope that it helps until somebody fixes it or until we can afford new GPU :D

gtrrnr commented 1 year ago

Finally managed to compile and install xformers properly on latest version of stable-diffusion-webui and wrote installation guide from scratch. Not sure if this will be obsolete in near future or if it can be applied to build on architectures below Maxwell but at the moment it works. Steps 5 to 7 are redundant since you can make venv and install right version of pytorch beforehand or even better - edit dependencies to download pytorch 1.13.1+cu117 instead of constant downloading and uninstallation, but i am too lazy.

Stable diffusion setup with xformers support on older GPU (Maxwell, etc) for Windows

  1. Check your videocard CUDA version support under "CUDA-Enabled GeForce and TITAN Products":

    https://developer.nvidia.com/cuda-gpus

    I tested it on GTX 980 so CUDA 5.2 should work, not sure about any lower CUDA version(5.0 and below)

  2. Install git:

    https://gitforwindows.org/

  3. Clone repo:

    git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
  4. Install python 3.10:

    https://www.python.org/downloads/

    Make sure it's 3.10.X

  5. Launch webui-user.bat once, let it make venv, download dependencies and install them.

  6. Open webui url, generate 1 image, close it after it finishes.

  7. Open powershell.exe , type:

    cd stable-diffusion-webui\venv\Scripts
    .\Activate.ps1
    pip uninstall torch*
    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
  8. Install Visual Studio 2022 Community Edition, during installation select "Desktop development with C++"":

  9. (Optional) Install ninja build to speed up compilation speed:

    • Download ninja-win.zip from https://github.com/ninja-build/ninja/releases and unzip it.
    • Place ninja.exe under C:\Windows OR add the full path to the extracted ninja.exe into system PATH
    • Run ninja -h in cmd and verify if you see a help message printed
  10. Launch cmd.exe as admin and type :

    git config --system core.longpaths true
  11. Launch regedit.exe, check:

    "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled"

    Make sure it's set to 1 and it's type is REG_DWORD

  12. Open powershell.exe, type:

    cd C:\stable-diffusion-webui\venv\Scripts
    .\Activate.ps1
  13. Inside powershell.exe with venv enabled(step 7) type:

    set TORCH_CUDA_ARCH_LIST=5.2
    pip install -v -U "git+https://github.com/facebookresearch/xformers.git@main#egg=xformers"
  14. If stars are right and everything is installed you should check cuda versions and features support:

    python -c "import torch; print(torch.__version__)"
    1.13.1+cu117

    And for xformers:

    python -m xformers.info

    It should look something like:

    A matching Triton is not available, some optimizations will not be enabled.
    Error caught was: No module named 'triton'
    xFormers 0.0.16+6f3c20f.d20230116
    memory_efficient_attention.cutlassF:               available
    memory_efficient_attention.cutlassB:               available
    memory_efficient_attention.flshattF:               available
    memory_efficient_attention.flshattB:               available
    memory_efficient_attention.smallkF:                available
    memory_efficient_attention.smallkB:                available
    memory_efficient_attention.tritonflashattF:        unavailable
    memory_efficient_attention.tritonflashattB:        unavailable
    swiglu.fused.p.cpp:                                available
    is_triton_available:                               False
    is_functorch_available:                            False
    pytorch.version:                                   1.13.1+cu117
    pytorch.cuda:                                      available
    gpu.compute_capability:                            5.2
    gpu.name:                                          NVIDIA GeForce GTX 980
    build.info:                                        available
    build.cuda_version:                                1107
    build.python_version:                              3.10.2
    build.torch_version:                               1.13.1+cu117
    build.env.TORCH_CUDA_ARCH_LIST:                    None
    build.env.XFORMERS_BUILD_TYPE:                     None
    build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
    build.env.NVCC_FLAGS:                              None
    build.env.XFORMERS_PACKAGE_FROM:                   None
    source.privacy:                                    open source
  15. Add xformers to webui-user.bat

    • Open webui-user.bat and add --xformers in set COMMANDLINE_ARGS=
    • Example webui-user.bat:

      @echo off
      
      set PYTHON=
      set GIT=
      set VENV_DIR=
      set COMMANDLINE_ARGS=--opt-split-attention --xformers
      
      call webui.bat
      
    • Save it.
  16. Launch webui-user.bat and hope that it will work

aliencaocao commented 1 year ago

Such a build guide already exists in the wiki, fyi

gtrrnr commented 1 year ago

Such a build guide already exists in the wiki, fyi

Yes, i saw it. However it misses fix for windows that prevents git from pulling xformers(too long filenames), setting right version of cuda (specifically cuda 5.2) and has even more redundant steps(like making separate venv, installing old pytorch into it, building wheel only to install it later in webui venv) and also states that you should use --force-enable-xformers which is currently broken and will disable xformers due to import error:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5898#issuecomment-1368054928

aliencaocao commented 1 year ago

Good catches. Looks like the wiki could use some update. @ClashSAN