bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.04k stars 503 forks source link

Server absent from health.petals.ml #254

Open Vahe1994 opened 1 year ago

Vahe1994 commented 1 year ago

Server under the NAT stop being detected by others after several days working fine. The server stopped being shown in health.petals and requests to it stopped coming. There isn't any errors or logs that indicates that something went wrong. Last logs from the server are attached below image

justheuristic commented 1 year ago

Looks like some issue with long-term circuit relays -- OR it could be caused by temporarily severed internet connection on your side.

If latter is the case, a typical failure scenario looks like this:

justheuristic commented 1 year ago

[based on a quick chat with @Vahe1994 ]

Suggested solution: during the next sprint (T + 3-4 weeks), when we work on means to run Petals server in background, we can add a check that the server is present on the health.petals.ml dashboard. If it isn't (but the dashboard is up), restart the server.