Open consideRatio opened 1 year ago
I agree that supporting multiple namespaces is not worth the additional complexity.
Do we actually need to support dynamic updates to c.KubeClusterConfig.namespace
? Without it, things are considerably simpler; the downside, of course, is that you lose track of resources in the old namespace. That could be mitigated with emitting a noisy warning/error on helm upgrade.
If we do want to support dynamic updates - the proposal above LGTM.
Hmmm your response didn't make me understand clearly what you meant, so I'll try to define some terminology. Also the configuration isn't defined by my response besides an idea about it should be nested under rbac
the helm chart configuration, for example using the key scope
, but its not clear what structure etc makes sense or is common in other helm charts.
Static credentials: k8s api-server RBAC credentials as resources like (Cluster)Role / (Cluster)RoleBinding / ServiceAccount created as part of installing/upgrading the Helm chart.
Dynamic credentials: like static credentials above except they are dynamically created by the software deployed by the Helm chart.
Dynamic configuration: where the dask-gateway client user or dask-gateway-server admin has enabled a way to set c.KubeClusterConfig.namespace
for specific DaskCluster's created, so that they may end up in different namespaces without us knowing what ahead of time.
I think we should support static credentials and dynamic configuration, but not dynamic credentials. Support for dynamic configuration would be implemented with static credentials providing ClusterRole access, while the chart local namespace and other namespace situations would be supported by static credentials with Role resources where possible - created in one specific namespace.
The Helm chart configuration we add should be located under rbac
in values.yaml as it only relates to the Helm chart and not the software deployed. I propose more specifically now:
rbac.clusterWideOperation
, a boolean deciding if we are providing full permissions as we are now, or if we should only provide it to work against a specific namespace as configured via gateway.backend.namespace
.
With clusterWideOperation set, permissions should be retained as they are now. With it false, permissions should be sufficient to manage dask clusters in gateway.backend.namespace
(the chart local namespace if unset)
I'm not fully confident on where things are created etc, but I think this is doable. Here is relevant documentation. I think its fair to ask that the using a dedicated namespace also manually creates it ahead of time.
I suspect https://github.com/dask/dask-gateway/pull/626 may be almost doing exactly this, but I'm not sure.
I think we should support static credentials and dynamic configuration, but not dynamic credentials.
Thanks - I didn't understand this from the previous discussion. The only way to support dynamic configuration is with non-namespaced RBAC, since the DaskClusters may spawn in arbitrary namespaces.
I think that rbac.clusterWideOperation
is a clear description for a boolean.
Practically, this means we will have three variations on RBAC configurations bound to the ServiceAccount in the chart-local namespace:
rbac.cWO | gateway.backend.namespace | RBAC |
---|---|---|
False | unset or chart-local | Role/Rolebinding in chart-local |
False | other-namespace | Role/Rolebinding in other-namespace |
True | N/A | ClusterRole/ClusterBinding |
An addition to the "Practical Changes" section above: in addition to modifying the chart, there are also changes in code. For example: kubernetes/backend.py#L374 would need to call list_namespaced_custom_object
for the first and second cases above.
@consideRatio: I think we all got busy with higher priority stuff for a while, but it's come back on my radar as there's another site that's interested in namespaced dask-gateway (@clundst).
Are you OK with my comment above and should we proceed on a generating a PR?
I'm also a little fuzzy on the 2nd use case above: where gateway.backend.namespace != .Release.namespace
. Do we deploy everything into the release namespace except daskcluster objects?
@consideRatio: been a few months -- please let me know if you're OK with the approach before I sync up a PR against the current main.
I think Erik is out for the rest of December.
On Dec 19, 2023, at 3:41 PM, Burt Holzman @.***> wrote:
@consideRatio https://github.com/consideRatio: been a few months -- please let me know if you're OK with the approach before I sync up a PR against the current main.
— Reply to this email directly, view it on GitHub https://github.com/dask/dask-gateway/issues/661#issuecomment-1863511562 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIUPLA4A3GPGMGGN5GDYKICZ3BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTMOJWGA3TGOBXQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRRGUYDMMZRHAYDKNFHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Looking through this again, okay this sounds reasonable @holzman.
Here are some thoughts summarized:
rbac.clusterWideOperation=true
(the default), we retain existing functionalityrbac.clusterWideOperation=false
we get a new kind of behavior that is influenced by gateway.backend.namespace
We've deliberated on using gateway.backend.namespace
to create the RBAC resources here, but what we create influences traefik, the controller, and the gateway-api pod - this is unexpected given that this configuration is nested under gateway
.
I propose the following config:
rbac.manageAnyNamespace
, if true we create ClusterRole/ClusterRoleBindings and rbac.managedNamespaces
is to be ignoredrbac.managedNamespaces
, specifies namespaces where we provide permissions to create dask clusters in, where leaving it unset implies the release namespace (where the chart was installed)An example config could then be rbac.manageAnyNamespace=false
rbac.managedNamespaces=[my-dask-clusters]
and gateway.backend.namespace=my-dask-clusters
.
I'd like the PR implementing this to test specifying some other namespace in gateway.backend.namespace
and not using cluster wide operation. There is a test right now that runs dask-clusters in another namespace already, that test can include specifying rbac.manageAnyNamespace
to false. So I think this part could be quick and easy.
Overall, I think the PR could be quite small and simple: add the new config, transition from ClusterRole(Binding) to Role(Binding) if rbac.manageAnyNamespace=false
and render resources per namespace specified.
If this is done, I think the test automation we have already will clarify if it works or not.
We were discussing this in our team meeting today. Just wanted to let you know that we haven't abandoned it, and I'll find some time in the next few weeks to work on this.
Currently the Helm chart provides more cluster wide permissions against the k8s api-server than may be needed, depending on configuration of dask-gateway.
The following PRs are opened related to this, but I felt that reviewing them was focusing on technical details before I had considered the configuration API that would make to support. Due to that, I've opened this issue to brainstorm and try to establish what I'm looking for.
626
594
Situations
The helm chart is installed in one location which provides a dask-gateway-controller, dask-gateway-server (api pod), dask-gateway-proxy (traefik pod). They should be provided with the relevant permissions to create and interact with the k8s resources created.
Thanks to configuration, on demand created Dask clusters can be created in different namespaces. Due to that, we provide permissions to work with resources in all namespaces.
1. Local namespace: clusters created only in helm chart local namespace
Reduced permissions required.
With local I mean where the daks-gateway Helm chart was installed.
2. Other namespace: clusters created only in another pre-configured namespace
Reduced permissions required, but not just like as if everything was put in a local namespace.
3. Any namespace: clusters created in any namespace
Full cluster wide permissions required. My assumption is that it is not possible to provide RBAC resources defining permissions to namespaces matching a certain naming pattern or having a certain label, if that is incorrect we can do something smarter here as well.
Current configuration options
gateway.backend,namespace
.gateway.backend.namespace
is mapped to the Python configurationKubeClusterConfig.namespace
within the dask-gateway server (api pod).Future configuration options?
What we currently have seems sufficient to me.
gateway.backend.namespace
is used (as compared to settingc.KubeConfigCluster.namespace = "something"
in a custom Python config file provided to dask-gateway-server).Implementation ideas
gateway.backend.namespace
, we should reduce the permissions to that.I think when someone doesn't configure the
gateway.backend.namespace
, we should reduce the permissions to the local namespace.A secure default is far better than an unsecure default. At the same time, a user should be able to request cluster wide permissions so that c.KubeClusterConfig.namespace` can be dynamically updated and still work.
So, we would need another Helm char configuration option to provide cluster wide permissions. I note that
rbac.scope
is used by the prometheus helm chart.With all this considered, I think we should let
gateway.backend.namespace
just be something we pass through to the Python configurationc.KubeConfigCluster.namespace
, and letrbac.scope
decide if we provide cluster wide or namespace scoped RBAC permissions. That way, we have sufficient flexibility I think.Towards resolving this
rbac.scope
like configuration syntax. It should support cluster wide mode and single namespace mode, support for multiple namespaces seems like more complexity than merited in my mind.