Open yuvipanda opened 6 months ago
Instead, we should move to a model where a hub can possibly have a staging hub associated with it, rather than a cluster having a staging hub associated with it.
I've written a few things async below to help know what I'm thinking about here, but I'm not ready to commit to an unplanned async followup discussions since its unplanned work that I do very inefficiently async - happy to chat unplanned sync about this though!
This is the key feedback I have - what does it mean to have a staging hub for a community hub in a dedicated/shared cluster respectively, as compared to having just "another hub" in the cluster - I'd like to see this more cleary defined.
The assumptions on the cluster wide staging
hub for is that it should block deployment of prod hubs if it fails. What is the expectations for r-staging
though? Should r-staging
block failure block r-prod
deploy - it doesn't currently. To support functionality like this in our current CD setup is very complicated or impossible without compromising notably on total execution time.
The assumption I guessed in shared billing for ucmerced-staging
and ucmerced
(GCP) in 2i2c's shared cluster was that costs gets combined into a single hub's cloud costs. This wasn't causing unfairness for other communities, but if ucmerced + staging
was in AWS, each hub adds pods which in the end forces additional core nodes adding costs - making it an actual cost to have an almost entirely idle hub.
To have a staging hubs for hubs also adds some administrative costs and complexity.
My main concern in implementing this is around the CI/CD setup as well. Right now if any staging hub (where there are multiple) fails, then no production hubs will be deployed. So if we go ahead with this, then we need to figure out a sensible way to link a staging and prod hub together, and also what that conceptually means.
Originally posted by @sgibson91 in https://github.com/2i2c-org/infrastructure/issues/3984#issuecomment-2074471659
We already have a few places where there are multiple staging hubs per cluster:
I believe the current assumption is 'one staging hub per cluster'. Instead, we should move to a model where a hub can possibly have a staging hub associated with it, rather than a cluster having a staging hub associated with it.