gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.97k stars 1.24k forks source link

[ws-manager-bridge] Garbage-collect workspace instances whose workspace clusters are not available anymore #6770

Open geropl opened 2 years ago

geropl commented 2 years ago

context:

Front logo Front conversations

JanKoehnlein commented 2 years ago

I am trying to reverse-engineer the action for this issue form the given context. Please clarify: Is this about 1) adding an additional (timeout-based?) mechanism on the ws-manager-bridge that also works when the cluster goes donw without deregistering, or 2) taking further action in the ws-manager-bridge when receiving a forced deregistration request 3) None of the above (please specify)

geropl commented 2 years ago

Sorry for lagging details: This is about adding a mechanism that ensures we don't leak workspaces dangling in any state other than stopped once a workspace cluster is de-registered.

We could discuss whether this should be: a) timeout-based, and workspace may re-appear if we re-register a workspace quickly enough. This would make it a bit more fault-tolerant and safe. This is samewhat tricky, as we need to poll d_b_workspace by region. But it's impossible atm for ws-manager-bridge to distinguish between "I'm don't know this region but someone else does" and "no-one is governing this region". b) using the "forced de-registration" request as signal might serve as a first version. Although this might lead to unwanted fall-out in case we mis-used the gptcl clusters command.

a) will become possible once we have the changes required for simplified meta, hence I did not prioritize this, yet.

geropl commented 2 years ago

Assigned this to epic "Simplified Multi-Meta"

geropl commented 2 years ago

Unassigned, because there's no immediate connection

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

geropl commented 2 years ago

@jankeromnes Please ping if I can help with additional context/details 👍

jankeromnes commented 2 years ago

Many thanks @geropl! Planning to start looking into either this or https://github.com/gitpod-io/gitpod/issues/12283 once https://github.com/gitpod-io/gitpod/issues/12580 is done.

jankeromnes commented 2 years ago

Okay, I'm now blocked on this question and have already started one distraction, so I guess now is the time to pick this up! 😆 🚀

jankeromnes commented 2 years ago

From this comment https://github.com/gitpod-io/gitpod/issues/6770#issuecomment-1004697480 I deduce that we want:

b) using the "forced de-registration" request as signal might serve as a first version. Although this might lead to unwanted fall-out in case we mis-used the gptcl clusters command.

Please let me know if I got that wrong.

jankeromnes commented 2 years ago

@geropl Would love to chat about this more, if you have time. I've added TODOs in two potential fix locations in https://github.com/gitpod-io/gitpod/pull/12912/files but I'm not really sure which is best or what the implications are. 💭

jankeromnes commented 2 years ago

Had a brief chat this morning. Summary:

geropl commented 2 years ago

Moved to next week so we can implement a GC using info about "all registered clusters".

geropl commented 1 year ago

Dropping assignment because not actively working on it atm.