billimek / k8s-gitops

GitOps principles to define kubernetes cluster state via code
Apache License 2.0
642 stars 84 forks source link

Question: Confused on Vault #93

Closed onedr0p closed 4 years ago

onedr0p commented 4 years ago

Hi @billimek I am learning Kubernetes and some of the services that go with it so bare with me. This isn't really an issue, or maybe it is if we can improve the documentation :)

Following a lot of your examples I have a GitOps repository here. I have 4 RPis, and have provisioned kured, metallb, metrics-server, and now I am working on Vault so I can actually start deploying apps with secrets.

I have a RPi outside my cluster where I have my Vault data, following your example. I am a little lost on this block of code, what is the lb in the address http://lb:8200? Is there an issue using 127.0.0.1 instead of lb, or is this lb the address of the Pi where I have Vault running? I don't see anywhere else you mention an lb address.

seal "transit" {
            address = "http://lb:8200"
            disable_renewal = "false"
            key_name = "autounseal"
            mount_path = "transit/"
            tls_skip_verify = "true"
}

Anyways, thanks for your help (and some of your code) I am learning a lot!

billimek commented 4 years ago

Hi @onedr0p, the lb reference is indeed the hostname of the rpi where the vault transit server runs. See this issue for additional context about the same topic.

If you use 127.0.0.1 as the address of the transit server, I don't think this will work because it looks like it would be a self reference. I would simply replace lb with the hostname or IP address of the 'external' vault server you are using as the transit.

Does this sort of make sense? This is the second time the question has come up so I do think that the documentation should be updated. I'll do that as part of this issue!

onedr0p commented 4 years ago

That makes perfect sense, I actually got vault up and running now, thanks!

If you're interested in running more things in arm I also created docker images for vault-secrets-operator, flux, helm-operator and velero. Build times are pretty slow on the RPis, some taking longer than 20 minutes. I'm trying to figure out how to build these on my amd64 desktop.

https://github.com/onedr0p/homelab-gitops/tree/master/docker-arm

billimek commented 4 years ago

Fantastic, thank you for sharing this @onedr0p!

As an aside, I'm maintaining a list of workloads that support multi-arch containers for when running a hybrid amd64/arm/arm64 k8s cluster: https://github.com/billimek/k8s-gitops/blob/master/arm-matrix.md

onedr0p commented 4 years ago

That's great you have this documented, it would be nice if maintainers supported multi-arch as a first class citizen.

onedr0p commented 4 years ago

One other question, in the chart for vault-secrets-operator

    vault:
      address: "http://vault:8200"
      authMethod: kubernetes
      kubernetesPath: auth/kubernetes
      kubernetesRole: vault-secrets-operator
      reconciliationTime: "300"

The address here is also the transit server on the RPi?

billimek commented 4 years ago

No, the vault-secrets-operator is talking to the actual vault server running in kubernetes. The "http://vault:8200" address corresponds to the vault service that was deployed as a helm chart.

In the case where the vault deployment is running in a different namespace than the vault-secrets-operator, you would need to alter the 'address' to something like http://vault.<some other namespace>.svc.cluster.local:8200

onedr0p commented 4 years ago

Great, so it looks like vault:8200 should work in my case too since I have them both in the same namespace.

If someone else is coming here and running in a different namespace, the command below will get you the Cluster domain

kubectl get configmap coredns -n kube-system -o yaml
onedr0p commented 4 years ago

I have a little more debugging to do since something is not working


{"level":"error","ts":1574519759.7404988,"logger":"cmd","msg":"Could not create API client for Vault","error":"Error making API request.\n\nURL: PUT http://vault:8200/v1/auth/kubernetes/login\nCode: 400. Errors:\n\n* missing client token","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nmain.main\n\t/go/src/github.com/ricoberger/vault-secrets-operator/cmd/manager/main.go:85\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}```
onedr0p commented 4 years ago

Got past that with following your guide on vaults-secret-operator :man_facepalming:

Now running into a x509 cert issue in the vaults-secret-operator pod


{"level":"error","ts":1574521756.814838,"logger":"cmd","msg":"Could not create API client for Vault","error":"Error making API request.\n\nURL: PUT http://vault:8200/v1/auth/kubernetes/login\nCode: 500. Errors:\n\n* Post https://192.168.42.23:6443/apis/authentication.k8s.io/v1/tokenreviews: x509: certificate signed by unknown authority","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nmain.main\n\t/go/src/github.com/ricoberger/vault-secrets-operator/cmd/manager/main.go:85\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}```
onedr0p commented 4 years ago

Ugh seemed to be an issue with Fish, switched to bash and it works

billimek commented 4 years ago

I think not all is perfect with the transit unseal type solution. It looks like the 'unseal token' has some sort of TTL which requires creating a new unwrap token after some time.

In my setup, I noticed that the k8s vault server was crashlooping with,

➜ k -n kube-system logs -f vault-0
Error parsing Seal configuration: Error making API request.

URL: PUT http://lb:8200/v1/transit/encrypt/autounseal
Code: 403. Errors:

* permission denied

... and the corresponding transit vault server on the pi was showing the following,

2019-10-29T02:18:33.830Z [INFO]  core: vault is unsealed
2019-10-29T02:18:33.850Z [INFO]  expiration: lease restore complete
2019-11-12T18:44:39.938Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/h80b65e4303e34e49e8caca879443557fc2bb03e86062439a605c66ef20928f85
2019-11-26T19:59:08.644Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/h3a5402677bc32cc1b011a6e8a8270217d0880c2d4ab111815a2b511d5ef7f396
2019-11-26T20:00:26.687Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/h0a4b6238a1acbd14ee0041a50ae9c2ec6c678c2e6dbb1ef99aac1523d5b3199e
2019-11-26T20:02:03.436Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/h4daebd25d36919489d20513e2d5ac7a40c81de0859aca7ad842877f5cc0b913b
2019-11-26T20:03:42.521Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/h024e5b0f509ca06ab7335902e5ad64f3f843668c5184bebcee05224320a53747
2019-11-26T20:05:34.387Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/hd8254f3da5a9b1cf5b0b9528fd16a43f2d668cdb1fb6a09fa28f387265b87372
2019-11-26T20:09:12.175Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/hfc66e5b537436e14d86c4cd6186e364a9b038567d8e1652bc42918280e1d4e91
2019-11-26T20:09:50.057Z [INFO]  expiration: revoked lease: lease_id=auth/token/create/hf81e27576d92b79008863ccd38baaebe874c3c45f999eafdf7f6cb4560713c93

This suggests that there is a limit to the time that the unseal token will live. Indeed when creating a new unseal/unwrap token from the transit vault server the following info is shown,

pi@lb:~ $ VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=<redacted> vault unwrap
Key                  Value
---                  -----
token                <redacted>
token_accessor       <redacted>
token_duration       768h
token_renewable      true
token_policies       ["autounseal" "default"]
identity_policies    []
policies             ["autounseal" "default"]

The TTL (token duration) is 32 days. which seems to align a bit with the output from the transit vault server above (messages about expiration: revoked lease).

Will need to investigate further to see what options there are for a non-expiring unwrap token. I seem to remember that the vault server has a global max TTL that is set to 32 days and it may not be possible to make this longer. Maybe it's possible to just use the root token for the transit server as it's only purpose is to act as an unsealer.

It would be so much easier if there was a selfhosted KMS solution like we have in Google Cloud or AWS for easier unsealing.

billimek commented 4 years ago

It looks like this medium article explores the same issue. They don't offer a definitive solution but at least it confirms this is a problem to be solved.

onedr0p commented 4 years ago

I've decided to go back in your repos git history and see how you implemented sealedsecrets, it is a much simpler solution in my homelab.

billimek commented 4 years ago

Agreed that the vault implementation seems overly-complex right now. Hoping to find something easier to implement.