argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.97k stars 5.47k forks source link

Applications randomly becoming "OutOfSync" #2613

Closed SIGUSR2 closed 5 years ago

SIGUSR2 commented 5 years ago

Checklist:

Describe the bug Setup: multiple applications in a single private repository, https access. Symptoms: at random intervals, applications not set with syncPolicy: automated{} become "OutOfSync". In all cases, the diff refers to the argo-cd labels. We are NOT setting those labels at all in our sources.

L.E. - I noticed that apps with syncPolicy: automated{} are also affected.

===== apps/Deployment default/dante ======
10a11
>     app.kubernetes.io/instance: dante.default
===== /Service default/socks ======
7a8,9
>   labels:
>     app.kubernetes.io/instance: dante.default

This example app consists of:

GROUP  KIND        NAMESPACE  NAME   STATUS  HEALTH   HOOK  MESSAGE
       Service     default    socks  Synced  Healthy        service/socks configured
apps   Deployment  default    dante  Synced  Healthy        deployment.apps/dante configured

After manually synching:

ID  DATE                           REVISION
0   2019-10-24 20:24:16 +0000 UTC  master (fc01738)
1   2019-10-25 19:34:50 +0000 UTC  master (fc01738)
2   2019-10-28 16:38:10 +0000 UTC  master (fc01738)
3   2019-10-29 15:35:51 +0000 UTC  master (d18cb12)
4   2019-10-30 20:24:52 +0000 UTC  master (7cfa525)
5   2019-10-31 16:37:32 +0000 UTC  master (18ca6dc)

Git log for the source files & application definition:

# git log -- dante-deployment.yaml| egrep 'commit|Date'
commit fdb0f18da63c45c18d25fad1ff6a5c8eaa57544c
Date:   Fri Oct 4 17:49:48 2019 +0000
commit ef43bf73718db1f7a72bc3cd916ad35262ef90e5
Date:   Wed Oct 2 23:33:58 2019 +0200
commit cf9b5754bc1f98ac08610c76b11ee33a47c1bdec
Date:   Wed Oct 2 23:11:45 2019 +0200
commit 6a295b8127e90d1cc9136639971ce44b5d0b022b
Date:   Thu Sep 19 19:44:21 2019 +0200

# git log -- dante-socks-service.yaml| egrep 'commit|Date'
commit ef43bf73718db1f7a72bc3cd916ad35262ef90e5
Date:   Wed Oct 2 23:33:58 2019 +0200
commit cf9b5754bc1f98ac08610c76b11ee33a47c1bdec
Date:   Wed Oct 2 23:11:45 2019 +0200
commit 6a295b8127e90d1cc9136639971ce44b5d0b022b
Date:   Thu Sep 19 19:44:21 2019 +0200

# git log -- dante.default.yaml | egrep 'commit|Date'
commit 850bafcb97d73f1ec6b42de1a48e2226acd4fb83
Date:   Fri Oct 4 17:30:28 2019 +0000
commit 2b77f1d6bedfce2d69c49abea2c0809d6f076ebd
Date:   Fri Oct 4 16:58:12 2019 +0000
commit 6c4d2bb1cf5a377f7d25278ce97c1be5776cb8ec
Date:   Wed Oct 2 21:34:23 2019 +0000
commit f2a7259225c5310c4d4ab3264a248a4a545697dc
Date:   Wed Oct 2 21:21:36 2019 +0000

Since Oct 4 there were other commits in the same repo, on other files, but I was unable to link them to the changes in status of the affected apps.

commit d2f79acac3c61f3b05c519a06564e22e0138a062
Date:   Thu Oct 31 16:39:24 2019 +0000
commit 18ca6dc20fb3e0f38d50e023a590b9bd201fd1fe
Date:   Thu Oct 31 03:55:04 2019 +0000
commit 7cfa5254222465e5f7f2a3b0abbb2759d6c54a86
Date:   Wed Oct 30 20:02:40 2019 +0100
commit eb04b8ef0d375eb22a79ddd49f5257aeee871422
Date:   Tue Oct 29 19:25:03 2019 +0000
commit 76098000f30188aa181766cde25b72c430e29f2c
Date:   Tue Oct 29 19:24:45 2019 +0000
commit ada39c1cc1cfa05576c97e5fb836bf0884489cb2
Date:   Tue Oct 29 18:48:33 2019 +0000

To Reproduce

I was unable to reproduce the issue consistently.

Expected behavior

Application sync status should not change if it's source manifests are the same.

Version

# argo version
argocd: v1.2.1+a6a394b
  BuildDate: 2019-09-12T17:14:43Z
  GitCommit: a6a394ba93a2a56981a45b9395e245e5210eaa35
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.2.5+85f62df
  BuildDate: 2019-10-29T00:06:43Z
  GitCommit: 85f62dff9e3de5a2ee80e9350c897923f59b7e85
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: 0.13.1

It should be mentioned that I'm running argo-rollouts in the same namespace, though at this point it is supposed to do nothing - there are no Rollout objects in the cluster.

# kubectl get pods -n argocd                                                                                                                                                                                                                                                                  
NAME                                            READY   STATUS    RESTARTS   AGE
argo-rollouts-8665f5755b-hhb6n                  1/1     Running   0          22h
argocd-application-controller-8cdc754fc-dzrg2   1/1     Running   0          20h
argocd-dex-server-b7688f999-d8bs8               1/1     Running   0          22h
argocd-redis-fc585c648-lqs7f                    1/1     Running   0          20h
argocd-repo-server-5975698bf4-4xsqj             1/1     Running   0          22h
argocd-repo-server-5975698bf4-7flq2             1/1     Running   0          22h
argocd-repo-server-5975698bf4-bvhjf             1/1     Running   0          20h
argocd-repo-server-5975698bf4-dlpdm             1/1     Running   0          7h44m
argocd-repo-server-5975698bf4-kxwkw             1/1     Running   0          20h
argocd-repo-server-5975698bf4-m7ftg             1/1     Running   0          20h
argocd-repo-server-5975698bf4-mr8rb             1/1     Running   0          20h
argocd-repo-server-5975698bf4-s7dss             1/1     Running   0          7h44m
argocd-repo-server-5975698bf4-vq25g             1/1     Running   0          20h
argocd-repo-server-5975698bf4-z64xj             1/1     Running   0          20h
argocd-server-56b9ff755d-fvpg6                  1/1     Running   0          20h
argocd-server-56b9ff755d-gzkbm                  1/1     Running   0          22h
argocd-server-56b9ff755d-x2d6g                  1/1     Running   0          20h

Logs

First log line is me syncing it manually yesterday. Last 2 - me, syncing it manually today.

time="2019-10-30T20:24:52Z" level=info msg="Updated sync status: OutOfSync -> Synced" application=dante.default dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal

time="2019-10-31T14:58:44Z" level=info msg="Updated sync status: Synced -> OutOfSync" application=dante.default dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal

time="2019-10-31T14:58:44Z" level=info msg="Updated sync status: Synced -> OutOfSync" application=dante.default dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal

time="2019-10-31T16:37:32Z" level=info msg="Updated sync status: OutOfSync -> Synced" application=dante.default dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal

time="2019-10-31T16:37:33Z" level=info msg="Updated sync status: OutOfSync -> Synced" application=dante.default dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
alexec commented 5 years ago

There could be a few issues that underly this. One recent one is that the repo server becomes unavailable due to load.

Can I ask if you're using one repo or many repos?

SIGUSR2 commented 5 years ago

I'm using one repo - however, I do not understand what may be the connection between the repo's availability and the fact Argo CD considers it's own labels as a "difference".

SIGUSR2 commented 5 years ago

I've "solved" the mistery. It was an outside process overwriting some parts of the apps currently under Argo CD control. It wasn't normally supposed to do so, but due to a different problem it was crashing & doing a full repository apply every time it restarted. That was erasing the argo-cd specific labels, sending apps into "OutOfSync" mode.

This one may be closed.