golemfactory / ya-runtime-ai

1 stars 1 forks source link

Detect state of automatic #79

Closed staszek-krotki closed 2 months ago

staszek-krotki commented 3 months ago

Verify if it is possible to detect states of automatic:

Set activity status in case of errors

Potential problems not covered by this task:

pwalski commented 2 months ago

Scenario with used port 7861:

Automatic starts, initially loads model, but fails to start API server:

[2024-04-10T09:57:20.835+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     Started server process [82312]
[2024-04-10T09:57:20.835+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     Waiting for application startup.
[2024-04-10T09:57:20.835+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     Application startup complete.
[2024-04-10T09:57:20.836+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > ERROR:    [Errno 10048] error while attempting to bind on address ('127.0.0.1', 7861): only one usage of each socket address (protocol/network address/port) is normally permitted
[2024-04-10T09:57:20.837+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     Waiting for application shutdown.
[2024-04-10T09:57:20.837+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     Application shutdown complete.

finally Automatic gets stopped, and runtime quits

[2024-04-10T09:57:24.908+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > Press any key to continue . . . 
[2024-04-10T09:57:24.911+0200 DEBUG ya_runtime_ai::process::automatic] Automatic process has stopped
[2024-04-10T09:57:24.912+0200 ERROR ya_runtime_ai] process exit: Ok(ExitStatus(ExitStatus(1)))
[2024-04-10T09:57:24.912+0200 ERROR ya_runtime_ai] Activity loop error

Agreement is terminated

[2024-04-10T09:58:54.934+0200 INFO  ya_provider::market::provider_market] Terminating agreement [230b43395857fc8d49540eb6027572fe16eb1be527c88611aa5025ac09938387]. Reason: [NoActivity] No activity created within 90s
pwalski commented 2 months ago

Killing app with HTTP request (Invoke-WebRequest -URI http://127.0.0.1:7861/sdapi/v1/server-kill -Method 'Post') makes Activity to terminate together with Agreement

[2024-04-10T12:08:10.347+0200 INFO  ya_provider::market::provider_market] Terminating agreement [5aaf942f2f71017febdfe349249fd9bdce5b444f7948648272caecdf8641eb20]. Reason: [NoActivity] No activity created within 90s

Reason [NoActivity] No activity created within 90s is confusing.

pwalski commented 2 months ago

Not related directly to Automatic. Scenario of downloading model when there is no enough free storage space.

Runtime logs:

[2024-04-10T12:20:29.220+0200 ERROR ya_transfer::file] Error writing to file [E:/modules/golem-data/provider/exe-unit/cache/tmp/4d44bf1c5ec6b78a909227dc98983bb474322e24fa3869476171fd86]: IO error: There is not enough space on the disk. (os error 112)
[2024-04-10T12:20:29.223+0200 ERROR ya_runtime_ai] ExeScript failure: Service("DeployImage failed: Send error: send failed because receiver is gone")
[2024-04-10T12:20:29.223+0200 DEBUG rustls::common_state] Sending warning alert CloseNotify
[2024-04-10T12:20:29.330+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-10T12:20:29.357+0200 INFO  ya_runtime_ai] CTRL-BREAK received, Shutting down runtime...

This time Agreement was terminated by Requestor:

[2024-04-10T12:20:29.356+0200 INFO  ya_provider::market::provider_market] Agreement [3a5c1a5c6a23096176a053af5be22b2b95565948af2744c9de8e303248785088] terminated by Requestor. Reason: Work cancelled
[2024-04-10T12:20:29.356+0200 DEBUG ya_client::web] doing GET on http://127.0.0.1:12502/market-api/v1/agreementEvents?timeout=20&afterTimestamp=2024-04-10T10%3A20%3A29.354732%2B00%3A00&maxEvents=15&appSessionId=ya-provider-160060
[2024-04-10T12:20:29.356+0200 WARN  ya_provider::execution::task_runner] Activity [406eb92e260b4d59be1ad603c4506ef9] will be destroyed, because of terminated agreement [3a5c1a5c6a23096176a053af5be22b2b95565948af2744c9de8e303248785088].
[2024-04-10T12:20:29.357+0200 INFO  ya_provider::payments::payments] Payments - agreement [3a5c1a5c6a23096176a053af5be22b2b95565948af2744c9de8e303248785088] closed. Computing cost summary...
[2024-04-10T12:20:29.357+0200 INFO  ya_provider::execution::task_runner] Destroying activity [406eb92e260b4d59be1ad603c4506ef9].
[2024-04-10T12:20:29.357+0200 INFO  ya_provider::execution::exeunit_instance] Terminating ExeUnit [automatic]... pid: 165168
[2024-04-10T12:20:29.357+0200 INFO  ya_provider::market::negotiator::builtin::max_agreements] Negotiator: 1 free slot(s) for agreements.
[2024-04-10T12:20:29.357+0200 INFO  ya_provider::market::provider_market] Re-negotiating all demands
[2024-04-10T12:20:29.357+0200 DEBUG ya_provider::payments::agreement] Waiting for all activities to finish.
[2024-04-10T12:20:29.361+0200 INFO  ya_provider::execution::task_runner] ExeUnit process exited with status Finished - exit code: 0, agreement [3a5c1a5c6a23096176a053af5be22b2b95565948af2744c9de8e303248785088], activity [406eb92e260b4d59be1ad603c4506ef9].
[2024-04-10T12:20:29.361+0200 INFO  ya_provider::execution::task_runner] ExeUnit for activity terminated: [406eb92e260b4d59be1ad603c4506ef9].

Provider logs are full of

[2024-04-10T12:21:31.733+0200 WARN  ya_provider::payments::payments] Error sending debit note: Failed to issue debit note for activity [406eb92e260b4d59be1ad603c4506ef9]. HTTP error requesting POST http://127.0.0.1:12502/payment-api/v1/debitNotes: 400 Bad Request; msg: 'Cannot perform this operation while a transaction is open' Retry in 24.774295497s.

Activity probably should be immediately terminated when there is not enough storage to download model.

pwalski commented 2 months ago

Disabling GPU followed by sdapi/v1/txt2img request makes Automatic runtime to enter an unrecoverable state. (without deliberately disabling a GPU this can be reproduced e.g. by hosting runtime on gaming laptop and changing its power policy) Runtime logs:

[2024-04-17T10:10:32.636+0200 DEBUG ya_gsb_http_proxy::gsb_to_http] Calling http://localhost:7861/sdapi/v1/txt2img
[2024-04-17T10:10:32.636+0200 DEBUG reqwest::connect] starting new connection: http://localhost:7861/
[2024-04-17T10:10:32.636+0200 DEBUG hyper::client::connect::dns] resolving host="localhost"
[2024-04-17T10:10:32.637+0200 DEBUG hyper::client::connect::http] connecting to [::1]:7861
[2024-04-17T10:10:32.776+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:10:32.945+0200 DEBUG hyper::client::connect::http] connecting to 127.0.0.1:7861
[2024-04-17T10:10:32.946+0200 DEBUG hyper::client::connect::http] connected to 127.0.0.1:7861
[2024-04-17T10:10:32.946+0200 DEBUG hyper::proto::h1::io] flushed 466 bytes
[2024-04-17T10:10:33.241+0200 DEBUG reqwest::connect] starting new connection: http://localhost:7861/
[2024-04-17T10:10:33.241+0200 DEBUG hyper::client::connect::dns] resolving host="localhost"
[2024-04-17T10:10:33.242+0200 DEBUG hyper::client::connect::http] connecting to [::1]:7861
[2024-04-17T10:10:33.552+0200 DEBUG hyper::client::connect::http] connecting to 127.0.0.1:7861
[2024-04-17T10:10:33.553+0200 DEBUG hyper::client::connect::http] connected to 127.0.0.1:7861
[2024-04-17T10:10:33.553+0200 DEBUG hyper::proto::h1::io] flushed 53 bytes
[2024-04-17T10:10:33.555+0200 DEBUG hyper::proto::h1::io] parsed 5 headers
[2024-04-17T10:10:33.555+0200 DEBUG hyper::proto::h1::conn] incoming body is content-length (22 bytes)
[2024-04-17T10:10:33.787+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:10:33.913+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > *** API error: POST: http://localhost:7861/sdapi/v1/txt2img {'error': 'RuntimeError', 'detail': '', 'body': '', 'errors': 'CUDA error: unknown error\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n'}
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Traceback (most recent call last):
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\streams\memory.py", line 98, in receive
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return self.receive_nowait()
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\streams\memory.py", line 93, in receive_nowait
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise WouldBlock
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     anyio.WouldBlock
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     During handling of the above exception, another exception occurred:
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Traceback (most recent call last):
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 78, in call_next
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         message = await recv_stream.receive()
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\streams\memory.py", line 118, in receive
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise EndOfStream
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     anyio.EndOfStream
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     During handling of the above exception, another exception occurred:
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Traceback (most recent call last):
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\api\api.py", line 187, in exception_handling
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await call_next(request)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise app_exc
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     127.0.0.1:57355 - "POST /sdapi/v1/txt2img HTTP/1.1" 500 Internal Server Error
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 70, in coro
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive_or_disconnect, send_no_error)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 108, in __call__
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         response = await self.dispatch_func(request, call_next)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\api\api.py", line 151, in log_and_time
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         res: Response = await call_next(req)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise app_exc
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 70, in coro
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive_or_disconnect, send_no_error)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\cors.py", line 84, in __call__
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\gzip.py", line 26, in __call__
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise exc
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, sender)
[2024-04-17T10:10:33.915+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in __call__
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise e
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in __call__
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\routing.py", line 718, in __call__
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await route.handle(scope, receive, send)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\routing.py", line 276, in handle
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\routing.py", line 66, in app
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         response = await func(request)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\routing.py", line 237, in app
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raw_response = await run_endpoint_function(
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\routing.py", line 165, in run_endpoint_function
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await run_in_threadpool(dependant.call, **values)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\concurrency.py", line 41, in run_in_threadpool
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await anyio.to_thread.run_sync(func, *args)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await get_asynclib().run_sync_in_worker_thread(
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await future
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         result = context.run(func, *args)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\api\api.py", line 380, in text2imgapi
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         processed = process_images(p)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 732, in process_images
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         res = process_images_inner(p)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 856, in process_images_inner
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         p.setup_conds()
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 1309, in setup_conds
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         super().setup_conds()
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 470, in setup_conds
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         self.c = self.get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, total_steps, [self.cached_c], self.extra_network_data)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 455, in get_conds_with_caching
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         cache[1] = function(shared.sd_model, required_prompts, steps, hires_steps, shared.opts.use_old_scheduling)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\prompt_parser.py", line 262, in get_multicond_learned_conditioning
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps, hires_steps, use_old_scheduling)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\prompt_parser.py", line 189, in get_learned_conditioning
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         conds = model.get_learned_conditioning(texts)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 669, in get_learned_conditioning
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         c = self.cond_stage_model(c)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return forward_call(*args, **kwargs)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\sd_hijack_clip.py", line 234, in forward
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         z = self.process_tokens(tokens, multipliers)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\sd_hijack_clip.py", line 265, in process_tokens
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         tokens = torch.asarray(remade_batch_tokens).to(devices.device)
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     RuntimeError: CUDA error: unknown error
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T10:10:33.916+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > ---
[2024-04-17T10:10:33.916+0200 DEBUG hyper::proto::h1::io] parsed 4 headers
[2024-04-17T10:10:33.916+0200 DEBUG hyper::proto::h1::conn] incoming body is content-length (331 bytes)
[2024-04-17T10:10:33.916+0200 DEBUG hyper::proto::h1::conn] incoming body completed
[2024-04-17T10:10:33.916+0200 DEBUG hyper::client::pool] pooling idle connection for ("http", localhost:7861)
[2024-04-17T10:10:33.916+0200 INFO  ya_gsb_http_proxy::gsb_to_http] sending GsbEvent nr 0

Activity does not get stopped. Re-enabling GPU does not resolve the issue - API response does not change after enabling it.

We could monitor for log message RuntimeError: CUDA error: unknown error and end activity when it happens.

The problem is Automatic breaks up silently whenever GPU turns off, but any error log we could track appears only after an API request.

pwalski commented 2 months ago

Scenario of GPU usage by external app while Automatic runtime is alive:

  1. Start Provider with Automatic runtime (MockUI)
  2. Make initial txt2image request (it works)
  3. Start GPU game/benchmark
  4. Make another txt2image request (it hangs for few minutes and timeouts, benchmark slows down)

http request timeouts after few minutes

PS D:\Code\ya-runtime-ai> $images = Invoke-WebRequest -Method POST -Headers @{ "Authorization" = "Bearer SfHdEUAwIBiQyytaOa3mvDVYHO8="; "Content-Type" = "application/json; charset=utf-8"; "Accept" = "text/event-stream" } -Body (@{ "prompt"="software developer on a vacations, on a beach, watching sunset, sitting on the sand, wearing hawaian shirt and a straw hat, drinks mojito"; "steps"=250 }|ConvertTo-Json) -Uri http://127.0.0.1:7465/activity-api/v1/activity/205b74f2455a4d0aac8a044811d81be6/proxy_http_request/sdapi/v1/txt2img | ConvertFrom-Json | Select images | Select-Object -Index 0
Invoke-WebRequest : The operation has timed out.
At line:1 char:11
+ $images = Invoke-WebRequest -Method POST -Headers @{ "Authorization"  ...
+           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

ya-rruntime-ai log (this bunched together Successfully sent activity usage message are unusual, no Automatic error logs, it might have been caused by heavy usage of virtual memory, not by GPU usage)

[2024-04-17T10:56:28.443+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:28.472+0200 DEBUG hyper::client::connect::http] connecting to 127.0.0.1:7861
[2024-04-17T10:56:28.473+0200 DEBUG hyper::client::connect::http] connected to 127.0.0.1:7861
[2024-04-17T10:56:28.473+0200 DEBUG hyper::proto::h1::io] flushed 53 bytes
[2024-04-17T10:56:29.451+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:30.454+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:31.458+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:32.467+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:33.480+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:34.487+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:35.495+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:36.499+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:37.508+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T10:56:38.055+0200 DEBUG hyper::proto::h1::io] parsed 5 headers
[2024-04-17T10:56:38.055+0200 DEBUG hyper::proto::h1::conn] incoming body is content-length (22 bytes)
[2024-04-17T10:56:38.056+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 

Requestor yagna log

[2024-04-17T10:47:53.822+0200 INFO  ya_activity::requestor::control] Created Activity [205b74f2455a4d0aac8a044811d81be6] for Agreement [0e249aa3f50f3a3f73cbd9d761328fd16464841e3150822c0bbb01374b41ac6e]
[2024-04-17T10:49:38.812+0200 ERROR ya_activity::http_proxy] Authorize error Forbidden: caller: 0x82a630d2447ffd282657978f9f76c02da8be9819 is not authorized: 0x484880510522d41cc9bf16d77e75ce13aa53dc90
[2024-04-17T10:49:53.055+0200 DEBUG ya_payment::service::public] Got SendDebitNote [a27e85c9-4f11-46ce-915b-7b08ae38c22a] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:49:53.059+0200 INFO  ya_payment::service::public] DebitNote [a27e85c9-4f11-46ce-915b-7b08ae38c22a] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:49:53.416+0200 INFO  ya_payment::api::debit_notes] DebitNote [a27e85c9-4f11-46ce-915b-7b08ae38c22a] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
[2024-04-17T10:50:32.590+0200 ERROR ya_activity::http_proxy] Authorize error Forbidden: caller: 0x82a630d2447ffd282657978f9f76c02da8be9819 is not authorized: 0x484880510522d41cc9bf16d77e75ce13aa53dc90
[2024-04-17T10:51:53.035+0200 DEBUG ya_payment::service::public] Got SendDebitNote [a9545d43-44c6-4dc5-8ce7-b79080d18764] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:51:53.036+0200 INFO  ya_payment::service::public] DebitNote [a9545d43-44c6-4dc5-8ce7-b79080d18764] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:51:53.849+0200 INFO  ya_payment::api::debit_notes] DebitNote [a9545d43-44c6-4dc5-8ce7-b79080d18764] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
[2024-04-17T10:53:53.040+0200 DEBUG ya_payment::service::public] Got SendDebitNote [21e99f92-2e64-4389-9589-7e9d9eee9ffc] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:53:53.042+0200 INFO  ya_payment::service::public] DebitNote [21e99f92-2e64-4389-9589-7e9d9eee9ffc] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:53:54.228+0200 INFO  ya_payment::api::debit_notes] DebitNote [21e99f92-2e64-4389-9589-7e9d9eee9ffc] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
[2024-04-17T10:55:08.657+0200 ERROR ya_activity::http_proxy] Authorize error Forbidden: caller: 0x82a630d2447ffd282657978f9f76c02da8be9819 is not authorized: 0x484880510522d41cc9bf16d77e75ce13aa53dc90
[2024-04-17T10:55:53.043+0200 DEBUG ya_payment::service::public] Got SendDebitNote [c2a25ae4-7737-471a-9aca-21013c26bef3] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:55:53.044+0200 INFO  ya_payment::service::public] DebitNote [c2a25ae4-7737-471a-9aca-21013c26bef3] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:55:54.590+0200 INFO  ya_payment::api::debit_notes] DebitNote [c2a25ae4-7737-471a-9aca-21013c26bef3] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
[2024-04-17T10:57:53.039+0200 DEBUG ya_payment::service::public] Got SendDebitNote [c9dd98f5-2cbc-43cc-9573-c2c3c6147859] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:57:53.040+0200 INFO  ya_payment::service::public] DebitNote [c9dd98f5-2cbc-43cc-9573-c2c3c6147859] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:57:54.906+0200 INFO  ya_payment::api::debit_notes] DebitNote [c9dd98f5-2cbc-43cc-9573-c2c3c6147859] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
[2024-04-17T10:59:53.042+0200 DEBUG ya_payment::service::public] Got SendDebitNote [3af0fe86-13ee-4aee-ada7-ee63fd093152] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:59:53.043+0200 INFO  ya_payment::service::public] DebitNote [3af0fe86-13ee-4aee-ada7-ee63fd093152] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T10:59:53.375+0200 INFO  ya_payment::api::debit_notes] DebitNote [3af0fe86-13ee-4aee-ada7-ee63fd093152] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
[2024-04-17T11:01:53.044+0200 DEBUG ya_payment::service::public] Got SendDebitNote [4d754ead-9799-4865-ba3e-3c7b9474f53f] from Node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T11:01:53.045+0200 INFO  ya_payment::service::public] DebitNote [4d754ead-9799-4865-ba3e-3c7b9474f53f] for Activity [205b74f2455a4d0aac8a044811d81be6] received from node [0x484880510522d41cc9bf16d77e75ce13aa53dc90].
[2024-04-17T11:01:53.857+0200 INFO  ya_payment::api::debit_notes] DebitNote [4d754ead-9799-4865-ba3e-3c7b9474f53f] for Activity [205b74f2455a4d0aac8a044811d81be6] accepted.
  1. Consecutive txt2image requests start hanging again
  2. After shutdown of benchmark APP pending requests finished

Conclusion:

  1. Automatic supports multiple pending requests
  2. It works when GPU is used by other apps (it affects performance)
  3. gsb-http proxy timeout is shorter than Automatic timeout (it it has any)
pwalski commented 2 months ago

I tried to use example Golem.Tools/App with model https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.safetensors?download=true?download=true Runtime started correctly, model got loaded (loading incompatible model sometimes fails), but requests are failing, and logs print recommendations about adding some config options to Automatic commandline which user is unable to do:

[2024-04-17T12:04:02.511+0200 DEBUG ya_gsb_http_proxy::gsb_to_http] Calling http://localhost:7861/sdapi/v1/txt2img
[2024-04-17T12:04:02.511+0200 DEBUG reqwest::connect] starting new connection: http://localhost:7861/
[2024-04-17T12:04:02.511+0200 DEBUG hyper::client::connect::dns] resolving host="localhost"
[2024-04-17T12:04:02.511+0200 DEBUG hyper::client::connect::http] connecting to [::1]:7861
[2024-04-17T12:04:02.751+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T12:04:02.781+0200 DEBUG hyper::client::connect::http] connecting to 127.0.0.1:7861
[2024-04-17T12:04:02.781+0200 DEBUG hyper::client::connect::http] connected to 127.0.0.1:7861
[2024-04-17T12:04:02.781+0200 DEBUG hyper::proto::h1::io] flushed 53 bytes
[2024-04-17T12:04:02.783+0200 DEBUG hyper::proto::h1::io] parsed 5 headers
[2024-04-17T12:04:02.783+0200 DEBUG hyper::proto::h1::conn] incoming body is content-length (22 bytes)
[2024-04-17T12:04:02.827+0200 DEBUG hyper::client::connect::http] connecting to 127.0.0.1:7861
[2024-04-17T12:04:02.843+0200 DEBUG hyper::client::connect::http] connected to 127.0.0.1:7861
[2024-04-17T12:04:02.850+0200 DEBUG hyper::proto::h1::io] flushed 466 bytes
[2024-04-17T12:04:03.757+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message
[2024-04-17T12:04:03.787+0200 DEBUG reqwest::connect] starting new connection: http://localhost:7861/
[2024-04-17T12:04:03.787+0200 DEBUG hyper::client::connect::dns] resolving host="localhost"
[2024-04-17T12:04:03.787+0200 DEBUG hyper::client::connect::http] connecting to [::1]:7861
[2024-04-17T12:04:04.095+0200 DEBUG hyper::client::connect::http] connecting to 127.0.0.1:7861
[2024-04-17T12:04:04.096+0200 DEBUG hyper::client::connect::http] connected to 127.0.0.1:7861
[2024-04-17T12:04:04.096+0200 DEBUG hyper::proto::h1::io] flushed 53 bytes
[2024-04-17T12:04:04.098+0200 DEBUG hyper::proto::h1::io] parsed 5 headers
[2024-04-17T12:04:04.098+0200 DEBUG hyper::proto::h1::conn] incoming body is content-length (22 bytes)
[2024-04-17T12:04:04.388+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
  0%|          | 0/250 [00:00<?, ?it/s]
  0%|          | 0/250 [00:00<?, ?it/s]
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > INFO:     127.0.0.1:64285 - "POST /sdapi/v1/txt2img HTTP/1.1" 500 Internal Server Error
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > *** API error: POST: http://localhost:7861/sdapi/v1/txt2img {'error': 'NansException', 'detail': '', 'body': '', 'errors': 'A tensor with all NaNs was produced in Unet. This could be either because there\'s not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.'}
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Traceback (most recent call last):
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\streams\memory.py", line 98, in receive
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return self.receive_nowait()
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\streams\memory.py", line 93, in receive_nowait
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise WouldBlock
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     anyio.WouldBlock
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     During handling of the above exception, another exception occurred:
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Traceback (most recent call last):
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 78, in call_next
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         message = await recv_stream.receive()
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\streams\memory.py", line 118, in receive
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise EndOfStream
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     anyio.EndOfStream
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     During handling of the above exception, another exception occurred:
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     Traceback (most recent call last):
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\api\api.py", line 187, in exception_handling
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await call_next(request)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise app_exc
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 70, in coro
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive_or_disconnect, send_no_error)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 108, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         response = await self.dispatch_func(request, call_next)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\api\api.py", line 151, in log_and_time
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         res: Response = await call_next(req)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise app_exc
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\base.py", line 70, in coro
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive_or_disconnect, send_no_error)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\cors.py", line 84, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\gzip.py", line 26, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise exc
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, sender)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise e
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\routing.py", line 718, in __call__
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await route.handle(scope, receive, send)
[2024-04-17T12:04:04.550+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\routing.py", line 276, in handle
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         await self.app(scope, receive, send)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\routing.py", line 66, in app
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         response = await func(request)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\routing.py", line 237, in app
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raw_response = await run_endpoint_function(
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\fastapi\routing.py", line 165, in run_endpoint_function
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await run_in_threadpool(dependant.call, **values)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\starlette\concurrency.py", line 41, in run_in_threadpool
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await anyio.to_thread.run_sync(func, *args)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await get_asynclib().run_sync_in_worker_thread(
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return await future
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         result = context.run(func, *args)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\api\api.py", line 380, in text2imgapi
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         processed = process_images(p)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 732, in process_images
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         res = process_images_inner(p)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 867, in process_images_inner
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\processing.py", line 1140, in sample
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\sd_samplers_kdiffusion.py", line 235, in sample
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\sd_samplers_common.py", line 261, in launch_sampling
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return func()
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\sd_samplers_kdiffusion.py", line 235, in <lambda>
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return func(*args, **kwargs)
[2024-04-17T12:04:04.551+0200 DEBUG hyper::proto::h1::io] parsed 4 headers
[2024-04-17T12:04:04.551+0200 DEBUG hyper::proto::h1::conn] incoming body is content-length (465 bytes)
[2024-04-17T12:04:04.551+0200 DEBUG hyper::proto::h1::conn] incoming body completed
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\repositories\k-diffusion\k_diffusion\sampling.py", line 128, in sample_euler
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         denoised = model(x, sigma_hat * s_in, **extra_args)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         return forward_call(*args, **kwargs)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\sd_samplers_cfg_denoiser.py", line 201, in forward
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         devices.test_for_nans(x_out, "unet")
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >       File "D:\Code\gamerhash-facade\modules\plugins\ya-automatic-ai\automatic\webui\modules\devices.py", line 136, in test_for_nans
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >         raise NansException(message)
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] >     modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > 
[2024-04-17T12:04:04.551+0200 DEBUG ya_runtime_ai::process::automatic::monitor] > ---
[2024-04-17T12:04:04.551+0200 DEBUG hyper::client::pool] pooling idle connection for ("http", localhost:7861)
[2024-04-17T12:04:04.552+0200 INFO  ya_gsb_http_proxy::gsb_to_http] sending GsbEvent nr 0
[2024-04-17T12:04:04.766+0200 DEBUG ya_runtime_ai] Successfully sent activity usage message

User received following response 200 OK: {"error":"NansException","detail":"","body":"","errors":"A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the \"Upcast cross attention layer to float32\" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check."}

It seems that Automatic API returned 500, but gsb http proxy returns 200.

Versions:

PS D:\Code\gamerhash-facade> .\modules\golem\yagna.exe --version
yagna 0.16.0-ai-rc13 (9ab5b417 2024-04-05 build #422)
[2024-04-17 12:25:20.241469 +02:00] ERROR [src\main.rs:166] ya-runtime-ai 0.1.1

(SHA256 B6F47B478BE36118D3DB732173C65C194BFF64342E28BFEC776F182A37DA69A4) (ya-runtime-ai still prints 0.1.1)