FalcoSuessgott / vault-kubernetes-kms

Encrypt Kubernetes Secrets using Hashicorp Vault as the KMS Provider
https://falcosuessgott.github.io/vault-kubernetes-kms/
MIT License
23 stars 1 forks source link

Static Pods cannot use Kubernetes Service Account Tokens #81

Open stephan2012 opened 1 month ago

stephan2012 commented 1 month ago

As explained in #80, Static Pods cannot reference other Kubernetes API objects like Service Accounts. Consequently, it is impossible to authenticate against Vault's Kubernetes authentication method because no Kubernetes Service Account Token (JWT) is available. As a result, authenticating through a Vault Token is the only solution currently because the API server cannot fetch decrypted data from etcd otherwise. For the same reason, the plugin cannot run as a regular Kubernetes Deployment.

A better approach could be using the AppRole ID authentication method.

Do you know if I miss something here?

FalcoSuessgott commented 1 month ago

Very interesting! Will investigate that, but you will most certainly be right about static pod not being able to reference any other API objects.. During development Ive always run k8s auth as a normal pod and never a static pod, thus Ive never faced that issue.

Adding approle as an auth method in the meantime can be quickly done.. I will brainstorm and see if there are any workarounds or how similar plugins handle that circumstance.

Thanks for reporting and providing these information. Will give you a heads up about my thoughts on this issue.

stephan2012 commented 1 month ago

I'm curious: Did you successfully perform a cold start when running the plugin as a regular Pod when all Kubernetes Secrets are encrypted? I'd assume there is a chicken-and-egg situation if the API server and plugin are down.

By the way, non-root tokens usually have a TTL, rendering them inappropriate for long-term setups. AppRole IDs are clearly the better approach here.

FalcoSuessgott commented 1 month ago

Your absolutely right. I will add approle auth this week and leave token auth for development purposes. Regarding the Static Pod issue I was thinking migration the plugin to a DaemonSet.. Any thoughts on that?

stephan2012 commented 1 month ago

Unless I got something wrong, the same rules apply to DaemonSet-spawned Pods: The Kubernetes control plane must be up and running before the kubelet can run anything that is not a static Pod (because it needs to talk to the API server). So, we end up in the same chicken-and-egg situation: The API server needs the KMS plugin to start, but it won't start until the API server is up …

Please let me know if I miss something here.

stephan2012 commented 1 month ago

One more thing about DaemonSets … They are intended to run a Pod on every node (or any node of a particular type), so they are probably not the right tool anyway.

FalcoSuessgott commented 1 month ago

By the way, non-root tokens usually have a TTL, rendering them inappropriate for long-term setups. AppRole IDs are clearly the better approach here.

That's right, but not for periodic tokens. That's why the docs state, to make sure the token is orphaned and periodic (https://falcosuessgott.github.io/configuration/#example-vault-token-auth-not-recommended). The same logic is used for Vaults Transit Autounseal mechanism. While it is true approle is a better auth approach, token auth still can be used.

So, we end up in the same chicken-and-egg situation: The API server needs the KMS plugin to start, but it won't start until the API server is up …

Maybe check out the E2E Tests where a kind cluster is created with the appropriate kube-apiserver & vault-kubernetes-kms settings. In my understanding, if you deploy the plugin as a static pod (which we already discussed, does not allow Kubernetes Auth, so this only applies for token&approle auth), a certain duration will be given to wait for all control plane components to come up (kind create cluster --wait=2m. If all CP components come up, everything is fine. But If that duration is exceeded, the cluster creation indeed fails and you would need to make sure all CP components can properly start.

For example. See https://github.com/kubernetes-sigs/aws-encryption-provider?tab=readme-ov-file#bootstrap-during-cluster-creation-kops:

To use encryption provider during cluster creation, you need to ensure that its running before starting kube-apiserver. For that you need to perform the following high level steps.

One more thing about DaemonSets … They are intended to run a Pod on every node (or any node of a particular type), so they are probably not the right tool anyway.

I threw in DaemonSets as another method to deploy the plugin. So dont deploy the plugin as a static resource, using a DaemonSet instead, to make sure each node has an instance of the plugin. This way kubernetes auth is also supported since we can now reference Service Accounts..

Now I get what you mean, since we patch the kube-apiserver manifest by specfying an EncryptionConfiguration which requires a socket, the kube-apiserer does not come up, without the plugin up and running... So we have to deploy the plugin as a static pod and make sure its running before the kube-apiserver. To do so we use the priorityClassName: system-node-critical flag in the pod manifest (https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)

So I think we cant use kubernetes auth... I will update the docs accordingly. Knowing that, I see no other way how Kubernetes Auth would work ...

Last but not least: #87 adds Approle Auth Support. Maybe check out the docs for usage. Note that there is now a new required cli arg -auth-method=token |approle. Feedback would be appreciated.

Liebe Grüße aus Australien :)

FalcoSuessgott commented 1 month ago

Updated the docs (https://falcosuessgott.github.io/vault-kubernetes-kms/configuration/#deploying-vault-kubernetes-kms)

stephan2012 commented 1 month ago

Again, thank you so much for your work on this plugin!

I've tested version v0.1.0 with AppRole ID, and it works! However, I found:

I'll leave the setup running for around a week and check for any issues.

stephan2012 commented 1 month ago

Maybe check out the E2E Tests where a kind cluster is created with the appropriate kube-apiserver & vault-kubernetes-kms settings. In my understanding, if you deploy the plugin as a static pod (which we already discussed, does not allow Kubernetes Auth, so this only applies for token&approle auth), a certain duration will be given to wait for all control plane components to come up (kind create cluster --wait=2m. If all CP components come up, everything is fine. But If that duration is exceeded, the cluster creation indeed fails and you would need to make sure all CP components can properly start.

I'm using kubeadm, which does not offer to add static Pod manifests alongside cluster initialization. So, when the Encryption Provider is configured, the API server startup fails because I can only make the KMS plugin available after kubeadm has finished.

stephan2012 commented 1 month ago

I'll leave the setup running for around a week and check for any issues.

It looks stable so far (v0.1.1)! The plugin has been up and running uninterrupted for more than 13 days, and I have not noticed any issues. I'm tempted to try it on a production cluster …

FalcoSuessgott commented 3 weeks ago

I'll leave the setup running for around a week and check for any issues.

It looks stable so far (v0.1.1)! The plugin has been up and running uninterrupted for more than 13 days, and I have not noticed any issues. I'm tempted to try it on a production cluster …

Sorry have been busy lately. Sounds great! I still want to improve the token renewal as already started on a branch locally. Once this is done and tested, I think we could label the project as 'beta' ready and move it out of its 'alpha' stage :D

xslicex commented 2 weeks ago

I think KMS provider plugin as a DaemonSet is a workable option. (reacting to stephan2012's comments on 15th of July, 2024) While apiserver waits for its EncryptionProvider plugin to be available, its TokenReview API is working, thus Vault can validate the plugin's ServiceAccount and the chicken-and-egg problem is avoided. And you can specify to run a DeamonSet on specific type of nodes, e.g. control-plane nodes, where api-server is running, which is exactly what is needed in our case.

stephan2012 commented 2 weeks ago

The kubelet can request data on API objects (e.g., DaemonSets) only after the Kubernetes control plane is up and running … 😉

xslicex commented 2 weeks ago

Well, other plugin implementations also implemented as DaemonSet - and they work. (E.g. Trousseau or Kleidi)