fiatrete / SDCN-Stable-Diffusion-Computing-Network

SDCN is an infrastructure that allows people to share and use Stable Diffusion computing power easily.
https://sdcn.info
MIT License
31 stars 12 forks source link

there should be a way to check if node is running #33

Open fiatrete opened 1 year ago

fiatrete commented 1 year ago

The Stable Diffusion webui seems to frequently crash for unknown reasons and lacks an automatic restart mechanism. dan-server is not aware of the operational status of registered nodes, which can result in task scheduling to offline nodes and consequently cause task failures.

I checked docs of Stable Diffusion webui and find an easy way to check its running status, to call the app_id API: curl -X 'GET' 'http://127.0.0.1:7860/app_id/' -H 'accept: application/json'

DiligentCatCat commented 1 year ago

Of course this works, but this problem will be solved if there is a daemon program in your custom SD-WebUI (or other things that are similar to SD-WebUI).

The daemon program takes the responsibility to notify the scheduler whether the worker node is alive. I think this is a more solution.

fiatrete commented 1 year ago

Of course this works, but this problem will be solved if there is a daemon program in your custom SD-WebUI (or other things that are similar to SD-WebUI).

The daemon program takes the responsibility to notify the scheduler whether the worker node is alive. I think this is a more solution.

Currently, we are using Stable Diffusion webui running in API mode as a temporary solution for our dan-node usage. However, this is not an ideal approach as Stable Diffusion webui was not designed as server-side software. The ultimate solution would be to rewrite the dan-node program, considering stable operation and fault recovery during the design phase.

Before that, use API monitoring to check if the node is running is a quick and effective temporary solution.