Open geropl opened 2 years ago
I am trying to reverse-engineer the action for this issue form the given context. Please clarify: Is this about
1) adding an additional (timeout-based?) mechanism on the ws-manager-bridge
that also works when the cluster goes donw without deregistering, or
2) taking further action in the ws-manager-bridge
when receiving a forced deregistration request
3) None of the above (please specify)
Sorry for lagging details: This is about adding a mechanism that ensures we don't leak workspaces dangling in any state other than stopped
once a workspace cluster is de-registered.
We could discuss whether this should be:
a) timeout-based, and workspace may re-appear if we re-register a workspace quickly enough. This would make it a bit more fault-tolerant and safe. This is samewhat tricky, as we need to poll d_b_workspace
by region
. But it's impossible atm for ws-manager-bridge
to distinguish between "I'm don't know this region but someone else does" and "no-one is governing this region".
b) using the "forced de-registration" request as signal might serve as a first version. Although this might lead to unwanted fall-out in case we mis-used the gptcl clusters
command.
a) will become possible once we have the changes required for simplified meta, hence I did not prioritize this, yet.
Assigned this to epic "Simplified Multi-Meta"
Unassigned, because there's no immediate connection
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@jankeromnes Please ping if I can help with additional context/details 👍
Many thanks @geropl! Planning to start looking into either this or https://github.com/gitpod-io/gitpod/issues/12283 once https://github.com/gitpod-io/gitpod/issues/12580 is done.
Okay, I'm now blocked on this question and have already started one distraction, so I guess now is the time to pick this up! 😆 🚀
From this comment https://github.com/gitpod-io/gitpod/issues/6770#issuecomment-1004697480 I deduce that we want:
b) using the "forced de-registration" request as signal might serve as a first version. Although this might lead to unwanted fall-out in case we mis-used the
gptcl clusters
command.
Please let me know if I got that wrong.
@geropl Would love to chat about this more, if you have time. I've added TODOs in two potential fix locations in https://github.com/gitpod-io/gitpod/pull/12912/files but I'm not really sure which is best or what the implications are. 💭
Had a brief chat this morning. Summary:
Moved to next week so we can implement a GC using info about "all registered clusters".
Dropping assignment because not actively working on it atm.
context:
Front conversations