[Bug/Crash]: Two Crashes (GPU and Webui) when you keep generating

metapea commented 1 day ago

Checklist

[ ] The issue exists after disabling all extensions
[ ] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[ ] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

There were two crashes on reForge during generation:

GPU crash, the gpu restarted then the Webui stopped working and i couldn't generate until the program restarted.
Webui/Generation crash, webui just stops during generation, still couldn't generate anything until the program restarted.
Both just show the last prompt that was generated when using the png info tab.

Steps to reproduce the problem

Start up Reforge
Use more then one loras (and maybe sd-forge-couple)
Generate for a while

What should have happened?

Not restarting my GPU or stop generating

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

OS: Win7 Using the latest version of reForge with CUDA 11.8 Python: 3.10.6

Console logs

GPU crash:
Traceback (most recent call last):
  File "X:\reforge\system\python\lib

\site-packa
ges\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "X:\reforge\system\python\lib

\site-packa
ges\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "X:\reforge\system\python\lib

\site-packa
ges\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "X:\reforge\system\python\lib

\site-packa
ges\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "X:\reforge\system\python\lib

\site-packa
ges\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "X:\reforge\system\python\lib

\site-packa
ges\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "X:\reforge\system\python\lib

\site-packa
ges\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "X:\reforge\webui\modules

\call_queue.py"
, line 91, in f
    devices.torch_gc()
  File "X:\reforge\webui\modules

\devices.py", l
ine 39, in torch_gc
    model_management.soft_empty_cache()
  File "X:\reforge\webui\ldm_patched

\modules\mo
del_management.py", line 834, in soft_empty_cache
    torch.cuda.empty_cache()
  File "X:\reforge\system\python\lib

\site-packa
ges\torch\cuda\memory.py", line 162, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API 

call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Webui/Generation crash:
Traceback (most recent call last):
  File "X:\reforge\system\python\lib\site-pack
ges\uvicorn\protocols\websockets\websockets_impl.py", line 255, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
  File "X:\reforge\system\python\lib\site-pack
ges\uvicorn\middleware\proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\fastapi\applications.py", line 273, in __call__
    await super().__call__(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\middleware\errors.py", line 149, in __call__
    await self.app(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\middleware\cors.py", line 76, in __call__
    await self.app(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\middleware\gzip.py", line 26, in __call__
    await self.app(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\middleware\exceptions.py", line 79, in __call__
    raise exc
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\middleware\exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "X:\reforge\system\python\lib\site-pack
ges\fastapi\middleware\asyncexitstack.py", line 21, in __call__
    raise e
  File "X:\reforge\system\python\lib\site-pack
ges\fastapi\middleware\asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\routing.py", line 341, in handle
    await self.app(scope, receive, send)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\routing.py", line 82, in app
    await func(session)
  File "X:\reforge\system\python\lib\site-pack
ges\fastapi\routing.py", line 289, in app
    await dependant.call(**values)
  File "X:\reforge\system\python\lib\site-pack
ges\gradio\routes.py", line 604, in join_queue
    session_info = await asyncio.wait_for(
  File "asyncio\tasks.py", line 445, in wait_for
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\websockets.py", line 133, in receive_json
    self._raise_on_disconnect(message)
  File "X:\reforge\system\python\lib\site-pack
ges\starlette\websockets.py", line 105, in _raise_on_disconnect
    raise WebSocketDisconnect(message["code"])
starlette.websockets.WebSocketDisconnect: 1001

Additional information

No response

Panchovix commented 10 hours ago

Hi there, CUDA kernel errors might be asynchronously reported at some other API sometimes can happen with overclocks and such.

reForge has some optimizations where some extreme OCs could be a bit inestable.

If it was at stock settings, what GPU are you using?

metapea commented 10 hours ago

GTX 10 series GPU (8GB) and stock settings. Just know that i was generating with high ram uses on the browser tab (#135), and i was using low priority on the Webui crash.

Panchovix / stable-diffusion-webui-reForge