flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.4k stars 577 forks source link

[Core feature] Support more k8s auth providers #3671

Open honnix opened 1 year ago

honnix commented 1 year ago

Motivation: Why do you think this is important?

Currently only native k8s token based auth is supported, in both flyteadmin and various plugins. For example in flyteadmin, in array plugin, and in ray plugin.

It would be great to support for example GKE, and EKS (I assume it provides auth similarly as GKE).

Goal: What should the final outcome look like, ideally?

Cloud providers' native k8s auth mechanism can be used to avoid using static token that is less secure.

Describe alternatives you've considered

Keep using static token and cert.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

eapolinario commented 1 year ago

Auth is a great candidate for a Special-interest group, so how about we use this issue to jumpstart one around that area?

cc: @davidmirror-ops

davidmirror-ops commented 1 year ago

Absolutely.

BTW the RFC is two approvals away from being Accepted, your review/approval would be very useful :)

Auth is a project-wide concern and definitely hits the mark for a SIG. We also have a suggestion for a "Config Overrides WG" so with those two groups, we'd have a great kickoff.

honnix commented 1 year ago

Auth is a great candidate for a Special-interest group, so how about we use this issue to jumpstart one around that area?

cc: @davidmirror-ops

@eapolinario Yeah I think auth is a good topic for a SIG. This specific issue could also be part of an "Integration SIG", if that matters.

@davidmirror-ops I took a look at the RFC and it looks good to me. I only had a couple of comments.

davidmirror-ops commented 1 year ago

@honnix this is a great idea, and fits in the area of project-wide concerns.

Would you be up to leading a SIG-auth so we can gather forces with other community members to improve the whole auth experience in Flyte? See more details here https://github.com/flyteorg/community/blob/main/GOVERNANCE.md#organizational-structure

Extending to more K8s auth providers could be a subproject on that SIG

honnix commented 1 year ago

@davidmirror-ops As much as I would love this to happen, unfortunately I will not be able to lead, mostly due to my work load and commitment. I can ask around in my organization as we do have a strong need for this feature.

honnix commented 1 year ago

@davidmirror-ops I asked around in my organization. While we are willing to join the activity implementing this feature, we don't feel like leading a SIG already at this point. It could be the case we don't have much experiencing so don't feel confident leading a group, and that might change after we work on a couple of related features/changes related to auth topic.

davidmirror-ops commented 10 months ago

We could setup a Working Group around this effort, but we'll need some feedback from the community to gauge interest. If this feature request is something you/your organization need, please add a thumbs up reaction to the Issue.

fg91 commented 10 months ago

@honnix, @davidmirror-ops ,

I need to ask a few clarifying questions to understand the proposal:

Currently only native k8s token based auth is supported. [...] avoid using static token that is less secure.

From what I know, in earlier version of K8s (<1.21 or 1.22?) the JWT token of a service account was stored in a k8s secret which was created when the service account was created and which never expired. This secret was mounted into the pods. In newer versions of K8s - at least by default - this service account secret with a never expiring token is not. Instead, Kubelet injects short-lived tokens acquired from kube-apiserver into the pod.

In either case, the tokens are used when a process running in the pod needs to authenticate to the Kubernetes API server (which is set as the audience of the token), e.g. when flytepropeller needs to retrieve the status of a pod. Is this correct so far?

For instance the multi-cluster setup guide contains instructions to copy this never expiring token of the flyteadmin service account in a data plane cluster into a config in the control plane cluster so that the flyteadmin in the control plane can talk to the kubernetes API server in the data plane cluster to create the flyteworkflow CR.

Is my understanding until here correct and is it these static tokens that you are referring to? And is the goal of this proposal to stop using these never expiring tokens from the service account secrets and instead use the tokens the Kubelet mounts into the pods? In a single cluster setup, what would the complications be of using a token that was injected into the pod via a different mechanism? Re-load it periodically? I see how it would make the multi cluster setup more complicated.

You said:

Cloud providers' native k8s auth mechanism

If I understand correctly, this newer token injection mechanism through Kubelet I described above is K8s specific, not cloud provider specific, or am I misunderstanding something? Which cloud provider specific mechanism are you referring to?

On GKE, for instance, we use workload identities to "let k8s service accounts impersonate GCP IAM service accounts" to have fine grained access control over "which pod can access which resources in GCP".

Do you know how this is implemented in detail, especially in the case of never expiring service account tokens mounted from the service account secret? Are these never-expiring tokens which "are originally intended for the Kubernetes API server" then also sent to GCP APIs? I never looked at the details here ...

If I'm not mistaken, if one didn't use workload identities, as far as GCP is concerned, the application default credentials that would be picked up by a pod would be the service account used by the node pool? Is there another mechanism on GCP?


I can't tell yet whether I would have capacity to work on this but I personally find the topic interesting and would definitely like to fully understand the current situation and the proposal.

honnix commented 10 months ago

Is my understanding until here correct and is it these static tokens that you are referring to? And is the goal of this proposal to stop using these never expiring tokens from the service account secrets and instead use the tokens the Kubelet mounts into the pods?

Yes that is all correctly. This issue is about multi-cluster setup, and is not necessarily about using the new token injection mechanism, because that injected token would not work across clusters.

On GKE, for instance, we use workload identities to "let k8s service accounts impersonate GCP IAM service accounts" to have fine grained access control over "which pod can access which resources in GCP

Yes, workload identity is one of the examples I had in mind. In a multi-cluster setup, flyteadmin could utilize workload identity to generate short-lived auth token (GCP service account access token) when talking to another GKE cluster. This mechanism is not the same as the native kubelet injected token.

GKE supports different types of auth, both native k8s and GCP IAM. The native k8s token (either the never-expiring one or short-lived one) only works for current k8s itself, not across k8s clusters, and not for other part of GCP.

If I'm not mistaken, if one didn't use workload identities, as far as GCP is concerned, the application default credentials that would be picked up by a pod would be the service account used by the node pool? Is there another mechanism on GCP?

Yes that is correct.