argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.41k stars 5.29k forks source link

argocd-initial-admin-secret password invalid #10708

Open mgugino-uipath opened 1 year ago

mgugino-uipath commented 1 year ago

Checklist:

Describe the bug

Occassionally, the password in argocd-initial-admin-secret does not allow the admin to authenticate. I created a small python script to validate the password bcrypt.checkpw(passwd, myhash) and it confirmed the password did not match the hash in argocd-secret.

I believe there is a race condition from multiple replicas (2) of argocd-server. I have included startup logs from two pod replicas that indicate both attempt to set the password simultaneously.

To Reproduce

Installed argocd via helm chart v4.10.5. argocd-server configured with two replicas. Attempt to authenticate using password in argocd-initial-admin-secret

It is unclear how to trigger the race condition, probably involves pod-A starting first but having a slower connection to the kubeapi than pod-B.

Code in question: https://github.com/argoproj/argo-cd/blob/v2.2.12/util/settings/settings.go#L1702

Expected behavior

Auth succeeded

Screenshots Version

argocd: v2.2.12+fd74756
  BuildDate: 2022-07-29T14:47:55Z
  GitCommit: fd74756aebe30c9567ff904a1c138a828f533ede
  GitTreeState: clean
  GoVersion: go1.16.11
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.2.12+fd74756

Logs

[root@server0 bin]# kubectl logs -n argocd argocd-server-756bd7bf85-h88xc
time="2022-09-26T16:08:37Z" level=info msg="Starting configmap/secret informers"
time="2022-09-26T16:08:37Z" level=info msg="Configmap/secret informer synced"
time="2022-09-26T16:08:37Z" level=info msg="Initialized admin password" <---------------
time="2022-09-26T16:08:37Z" level=info msg="Starting configmap/secret informers"
time="2022-09-26T16:08:37Z" level=info msg="secrets informer cancelled"
time="2022-09-26T16:08:37Z" level=info msg="configmap informer cancelled"
time="2022-09-26T16:08:38Z" level=info msg="Configmap/secret informer synced"
time="2022-09-26T16:08:38Z" level=info msg="argocd v2.2.12+fd74756 serving on port 8080 (url: https://alm.sfqamo3015391-cluster.example.com, tls: false, namespace: argocd, sso: false)"
time="2022-09-26T16:08:38Z" level=info msg="0xc000d483c0 subscribed to settings updates"
time="2022-09-26T16:08:38Z" level=info msg="Starting rbac config informer"
time="2022-09-26T16:08:38Z" level=info msg="RBAC ConfigMap 'argocd-rbac-cm' added"

[root@server0 bin]# kubectl logs -n argocd argocd-server-756bd7bf85-sv7lm
time="2022-09-26T16:08:37Z" level=info msg="Starting configmap/secret informers"
time="2022-09-26T16:08:37Z" level=info msg="Configmap/secret informer synced"
time="2022-09-26T16:08:37Z" level=info msg="Initialized admin password" <---------------
time="2022-09-26T16:08:37Z" level=info msg="Starting configmap/secret informers"
time="2022-09-26T16:08:37Z" level=info msg="secrets informer cancelled"
time="2022-09-26T16:08:37Z" level=info msg="configmap informer cancelled"
time="2022-09-26T16:08:37Z" level=info msg="Configmap/secret informer synced"
time="2022-09-26T16:08:37Z" level=info msg="argocd v2.2.12+fd74756 serving on port 8080 (url: https://alm.sfqamo3015391-cluster.example.com, tls: false, namespace: argocd, sso: false)"
time="2022-09-26T16:08:37Z" level=info msg="0xc000a0a3c0 subscribed to settings updates"
time="2022-09-26T16:08:37Z" level=info msg="Starting rbac config informer"
time="2022-09-26T16:08:37Z" level=info msg="RBAC ConfigMap 'argocd-rbac-cm' added"
jessesuen commented 1 year ago

I agree this is likely a race between argocd-server pods. The theoretical race is:

  1. server A and B both starts and sees secret is missing
  2. server A generates password xxxx and stores it into argocd-initial-admin-secret
  3. server B generates password yyyyyyy overrides previous value in argocd-initial-admin-secret
  4. server B stores yyyyyyy into argocd-secret
  5. server A stores xxxx into argocd-secret

argocd-secret contains xxxx, but intial-admin-secret contains yyyyyyy

I think to fix this, CreateOrUpdateSecretField should not create or update, it should only create the field if it didn't exist, but return an error or some indicator that it already was set, so that the caller will choose not to update argocd-secret.

cleverhu commented 1 year ago

I'd like to fix this, can this issue been assigned to me?

jessesuen commented 1 year ago

@cleverhu sure!

cleverhu commented 1 year ago

I tried it in local, found the password has benn inited for twice.

image

And the same as argocd-secret.

cleverhu commented 1 year ago

@jessesuen I pulled a request for this, PTAL.

cleverhu commented 1 year ago

@cleverhu sure!

I tried to fix it but I have encountered some difficulties. Can you give me some advice? Thanks!

wujunwei commented 1 year ago

Maybe,we can use configmap or secret even endpoint for a lock. something like leaderelection

cleverhu commented 1 year ago

Maybe,we can use configmap or secret even endpoint for a lock. something like leaderelection

I will try to do it,thanks.

cleverhu commented 1 year ago

Maybe,we can use configmap or secret even endpoint for a lock. something like leaderelection

I tried to use leaderelection, which is mostly used on the k8s controller. There may be some problems with this as a distributed lock.(For example, this will select a master, which will lead to the failure of initialization in other pods in the case of multiple replicas.) Thank you for your help.

wujunwei commented 1 year ago

Maybe,we can use configmap or secret even endpoint for a lock. something like leaderelection

I tried to use leaderelection, which is mostly used on the k8s controller. There may be some problems with this as a distributed lock.(For example, this will select a master, which will lead to the failure of initialization in other pods in the case of multiple replicas.) Thank you for your help.

I mean you can write your own distributed lock by using configmap just like leaderelection, leaderelection is not applicable here. But its practice can be used for reference。

lypanov commented 1 year ago

We're running into this while trying to setup ArgoCD for the first time. Is there any reason a 4 month old patch hasn't yet been looked into given how bad of an impression this leaves for a first time user trying to get Argo into production?

NgZiming commented 1 year ago

This issue still exists by now. For anyone who encounters this issue, here is a valid workaround:

https://github.com/argoproj/argo-cd/blob/master/docs/faq.md#i-forgot-the-admin-password-how-do-i-reset-it