RuntimeError: CUDA error: device-side assert triggered

thezveroboy commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Have you read FAQ on README?

[X] I have updated WebUI and this extension to the latest version

What happened?

when i run generation it goes to about 40%and then breaks will error

Steps to reproduce the problem

run generation

What should have happened?

it should work

Commit where the problem happens

last wersions

What browsers do you use to access the UI ?

No response

Command Line Arguments

set COMMANDLINE_ARGS=--xformers

Console logs

File "D:\sd-webui\webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\sd-webui\webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "D:\sd-webui\webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\sd-webui\webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\sd-webui\webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\sd-webui\webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\sd-webui\webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "D:\sd-webui\webui\modules\call_queue.py", line 77, in f
    devices.torch_gc()
  File "D:\sd-webui\webui\modules\devices.py", line 51, in torch_gc
    torch.cuda.empty_cache()
  File "D:\sd-webui\webui\venv\lib\site-packages\torch\cuda\memory.py", line 133, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect

Additional information

it was working correctly before I updated this extension a few days ago

andrewssdd commented 1 year ago

See the same error.

continue-revolution commented 1 year ago

This gives me little context on what I can do, you should at least provide the following information

What GPU you are using, what Operating Systems/PyTorch you are using, what WebUI/ControlNet you are using
What your configuration is, I need your screenshot of WebUI
Which commit worked for you. You can go backward by running “git checkout ”/“git checkout v1.x.x”(v1.x.x. is your version tag) on your terminal. This is the most important information.

gtbloody commented 1 year ago

我通过禁用xformers，可以避免这个错误，但是同时会导致出图速度变慢，我显存16g，出一个16帧的512*512，要2分钟 Do not use xformers can solve the problem, but of course the generation will slow down.

continue-revolution commented 1 year ago

Why don’t people try opt-sdp-attention. It is as effective as xformers but is far less buggy.

andrewssdd commented 1 year ago

I was going to reproduce it with a fresh A1111 and animatediff installation, but I could not. It works fine with xformers or opt-sdp-attention.

It is likely a conflict with another extension.

continue-revolution commented 1 year ago

@ctawong thanks for your effort to reproduce. Since I have to hijack tons of methods to support cli, I can only guarantee that things will work if you have this combination: AnimateDiff, ControlNet, SegmentAnything, TagAutoComplete.

iPyon777 commented 1 year ago

I encountered the same issue.

Environment: "webui" version: 1.6.0 "animatediff" version: 1.8.1

Summary: Errors encountered depending on config.json settings, even with minimal extensions in "webui".

Steps to reproduce:

Reinstalled "webui" and "animatediff". The issue seemed to be resolved.
Modified "config.json" back to its state prior to reinstallation. The error reappeared.

Observations: Even with almost no extensions added to "webui", errors can occur depending on the settings in config.json. It's unclear which specific settings in "config.json" are causing the issue.

continue-revolution commented 1 year ago

@iPyon777 you may send me your config if that cause issue. but I cannot guarantee that I can observe anything. I have never seen this error on my machine.

iPyon777 commented 1 year ago

@continue-revolution Thank you for your response, but I'm fine. I just wrote this to be helpful to the person who raised the issue.

(By the way, unrelated, but I always enjoy using this. Thank you.)

zixaphir commented 1 year ago

Why don’t people try opt-sdp-attention. It is as effective as xformers but is far less buggy.

Hello! While for most people this is probably true, for me opt-spd-attention is either completely broken or otherwise not behaving as expected. I'm running on an older nvidia card, a 1080ti, and this optimization performs ~4x slower than no optimization at all.

I've attached some the results of trying several optimizations. Each run is preceded by a webui restart. Each test I verified in cli that the optimization setting was applied.

optimization_benchmark.txt

continue-revolution commented 1 year ago

I see. I do everything on 3090/4090 and sdp works great on these 2 graphics cards.

zixaphir commented 1 year ago

I see. I do everything on 3090/4090 and sdp works great on these 2 graphics cards.

I bet they do and I am so jealous. I'm just trying to provide context, because I would love to use this optimization if it worked for me.

ace2duce commented 1 year ago

Same error

only happens after a CUDA out of memory

    return x * F.gelu(gate)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 520.00 MiB (GPU 0; 12.00 GiB total capacity; 9.44 GiB already allocated; 0 bytes free; 11.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Only solution is to close and open a new console with Auto1111

It can't recover from a CUDA error

Environment:

"webui" version: 1.6.0 "animatediff" version: 1.8.1 Installed locally

mrmeseeks23 commented 1 year ago

I am having the same issue running PyTorch 2.0.1 CUDA 11.8 with an NVIDIA L40 Runpod.io instance (oh and Linux Ubuntu 22.04). After 2-4 generations everything including simple text to image come out as complete noise. Full restart of Auto 1111 seems to work to fix but kind of time consuming. OTw working great when it does work.

continue-revolution commented 1 year ago

If you OOM, you will have to re-start, otherwise there is no way to recover from a bunch of injection; if you meet the error OP mentioned, then unfortunately I have no idea about the reason.

That said, finding a way to recover from assertion error might be a good investigation, but that takes time…

Freeeast123145 commented 1 year ago

我也出现了这样的错误：CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

人们为什么不尝试 opt-sdp-attention。它与 xformers 一样有效，但 bug 却少得多。我使用这样的方案能够第一次使用此插件，感谢大佬

josephrocca commented 1 year ago

Just adding here that going to "Settings > Optimizations > Cross attention optimization" and switching it from "Automatic" to sdp (and then restarting) fixed the issue for me.

nyukers commented 1 year ago

@josephrocca to sdp-nomem or sdp-scaled ?

josephrocca commented 1 year ago

Just normal sdp - called "sdp - scaled dot product" in the dropdown

Dnozz commented 8 months ago

Im getting this error as well.

Python == 3.10.6 cuda == 12.1 ? torch==2.1.2 torchvision==0.16.2 RTX 3070

SD was close to the output I wanted so didn't even change any settings, just simply ran the same setup again and error'd out. Even after restarting the console still immidiately errors. Restart PC, same. I removed xformers from my webui-user.bat file tried a few other default optimizations from the dropdown. It ran once on "auto" optimization, the second time broke it. Trying to delete "venv" and reinstalling those guys from cache didn't work..

https://pastebin.com/esxbDVcE

continue-revolution / sd-webui-animatediff