dstackai / dstack

dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
https://dstack.ai/docs
Mozilla Public License 2.0
1.53k stars 154 forks source link

Wait for cloud-init on dstack-gateway before attempting any operations #1220

Open jvstme opened 6 months ago

jvstme commented 6 months ago

Current

After connecting to dstack-gateway via SSH, dstack-server will attempt updating the gateway with update.sh or configuring it by calling the /api/config endpoint. However, dstack-gateway's installation and setup with cloud-init may be unfinished by that moment yet. This would lead to unclear dstack-server errors like

Failed to configure gateway 35.202.8.178: ReadError(‘’)

or

Failed to update gateway 35.202.8.178: /bin/sh: 0: cannot open dstack/update.sh: No such file

Proposed

This should improve the user experience, facilitate troubleshooting, prevent bugs.

r4victor commented 6 months ago

After #1236 we give gateway more than enough time to install and setup. If it takes more time for some reason, then we should fix the underlying problem. This issue only addresses the error messages, so I'd state it as minor.

peterschmidt85 commented 4 months ago

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

peterschmidt85 commented 3 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 commented 1 month ago

@jvstme is this issue still valid?

jvstme commented 1 month ago

@peterschmidt85, yes

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 day ago

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.