ArgoCD Vault Plugin loses connection to Vault

rliskunov commented 3 months ago

Describe the bug Periodically the plugin loses connection to Vault. In this way, after configuration the plugin works correctly, but after 15-20 minutes the connection is lost. Hard Refresh of the app does not help. However, If you restart argocd-repo-server and argocd-redis, everything works successfully. If you restart one of them, the problem does not solve.

I use Multitenancy with Kubernetes Authentication

To Reproduce

If you want to reproduce this, you will need the following:

Install Vault in a Kubernetes cluster
Enable Kubernetes authorization in Vault

Add policy to Vault - argocd-policy

path "secret/data/application/*" {
capabilities = ["read"]
}

Add a role to Vault - argocd-role, specifying the parameters

Bound service account namespaces - argocd-repo-server
Bound service account namespaces - argocd
Generated Token's Policies - argocd-policy

Add a secret to Kubernetes in values.yaml for ArgoCD Helm Chart

extraObjects:
- apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: argo-vault-secret
  namespace: argocd
stringData:
  VAULT_ADDR: http://vault.vault.svc.cluster.local:8200
  AVP_TYPE: vault
  AVP_AUTH_TYPE: k8s
  AVP_K8S_ROLE: argocd-role

Expected behavior

If you configure a connection to Vault for an application once, the connection will work stably.

Screenshots/Verbose output

Example of output

"helm template ... | argocd-vault-plugin generate -s argo-vault-secret -" failed exit status 1:
Error: Replace: could not replace all placeholders in Template: 
Error making API request. 
URL: GET http://vault.vault.svc.cluster.local:8200/v1/secret/data/application Code: 403. 
Errors: * 1 error occurred: * permission denied 
Error making API request.

Additional context If you don't use Multitenancy, but make the most insecure policy possible, the connection is stable.

path "secret/data/*" {
  capabilities = ["read"]
}

rliskunov commented 3 months ago

In general, it seems as if the problem was not timeout, but ServiceAccount

Let's say we have two applications: api and worker

A secret is generated for each of them, which allows to go to Vault. Example with api

- apiVersion: v1
  kind: Secret
  type: Opaque
  metadata:
    name: argo-vault-api
    namespace: argocd
  stringData:
    VAULT_ADDR: http://vault.vault.svc.cluster.local:8200
    AVP_TYPE: vault
    AVP_AUTH_TYPE: k8s
    AVP_K8S_ROLE: argocd-api

The argocd-api role is generated in Vault with the parameters

Bound service account namespaces - argocd-repo-server
Bound service account namespaces - argocd
Generated Token's Policies - api

Pod argocd-repo-server uses ServiceAccount argocd-repo-server. When we do Hard Refresh in ArgoCD for api, it's as if ServiceAccount argocd-repo-server clings to the argo-vault-api secret, losing connections to Vault for argo-vault-worker If we reboot the argocd-repo-server pod and do a Hard Refresh for the worker, then we lose the api

So when we used a universal role that has access to all secrets, we didn't encounter this problem

max-veit-nc commented 2 weeks ago

We are seeing a similar issue as we have a similar setup.

We have actually troubleshooted inside the avp-helm (in our case) sidecar container that we are using as part of the repo-server. It seems to us, that when using different AppRoles within the same sidecar, there is an issue with the token caching.

The concept is briefly discussed here: https://argocd-vault-plugin.readthedocs.io/en/stable/usage/#caching-the-hashicorp-vault-token

We believe, that there is a race-condition, whoever comes first to refresh a token (default lifetime is 20min), gets to execute. This gets a bit of additional randomness from having two repo-server instances and two sidecars therefore at the same time.

This is further supported by our discovery that this never happens for our second sidecar with avp that always uses the same secret and that we can always reproduce this by running a hard refresh for all of our applications (we are using 10+ different AppRoles in our case).

argoproj-labs / argocd-vault-plugin

ArgoCD Vault Plugin loses connection to Vault #614