Closed jgwest closed 5 months ago
Link to Red Hat Issue Tracker: https://issues.redhat.com/browse/GITOPS-3847
Following up from an internal discussion around how to handle the mix of namespace-scoped and cluster-scoped Rollouts controllers, here's a brainstorm of how it all fits together.
When installing Argo Rollouts controller on a cluster, there are 3 possible scenarios that could exist:
A) Cluster-scoped: 1 rollouts controller is watching for Rollouts CRs at cluster scope
B) Namespace-scoped: 1 or more Rollouts controllers are watching for Rollouts CRs at namespace scope (possible there are multiple controllers on the cluster, each watching a single namespace)
C) Hybrid: multiple Rollouts controllers on the cluster, with at most one being cluster-scoped, and the rest being namespace-scoped. Cluster-scoped install would ignore RolloutManager if there was a namespace-scoped install in that Namespace.
Thus, only the following two scenarios are supported by Argo Rollouts controller: A) 1 cluster-scoped Rollouts controller XOR B) 1 or more namespace-scoped Rollouts controllers
When reconciling RolloutsManager CRs, the Rollouts Operator can examine the current list of all RolloutManagers on the cluster, and use that to ensure that the cluster is in a valid state:
Reconcile
, list all existing RolloutManager CRs on the cluster. (You can do this with a k8sClient.List(...)
call, just making sure NOT to specify a namespace as part of the ListOptions. The RolloutManager controller should always already have permission to do this).Reconcile() should ONLY sets the .status field of the particular RolloutManager CR that it is reconciling. Don't worry about setting the status field of other RolloutManager CRs that might exist on the cluster. (These will eventually be reconciled on K8s controller resync, which occurs every X hours. So they will eventually have an error set)
Can we prevent a DOS, where a malicious user creates a RolloutManager in their own Namespace, which moves Rollouts controller into an unsupported use case?
An interesting question, and after some thought, while this is theoretically possible, I think this is already reasonably mitigated:
RolloutManager
CR in their Namespace, the cluster-scoped RolloutManager
CR, which was already installed, will continue to remain installed and running.
Deployment
, ConfigMap
, etc, of the cluster-scoped Rollout install will continue to run.RolloutsManager
CR in an error state, we are not deleting any of the existing K8s resources (Deployments, etc), nor are we scaling the replicas down to 0.RolloutManager
CRs by default: it is reasonable to expect cluster admins to carefully hand out this permission via RBAC.In the future, we could possibly something even fancier, perhaps with admission webhooks, or perhaps by adding support to upstream rollouts, but I think the above case is strong enough that it's fair to wait for a customer/user request before we work on this.
At present, the operator only supports namespace-scoped installs via the
--namespaced
parameter, which is currently hardcoded to be enabled by default.Work required
Controller should default to cluster-scoped install by default
Add an optional
namespaceScoped
bool field to theRolloutsManager
CRThe cluster-scoped install should ensure a ClusterRole/ClusterRoleBinding exist that allow the rollouts manager to watch the whole cluster
Likewise, the cluster-scoped install should NOT run with the
--namespaced
parameterIn the case where there exist multiple
RolloutsManager
CRs on the cluster, we should detect and report an invalid configuration. The.status
field should indicate why these are not being reconciled.Unit/E2E tests (using Ginkgo) to verify each of the above