argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18.07k stars 5.53k forks source link

ArgoCD Application Stuck In Syncing/Terminating State, when status field update exceeds Application CR resource size #8113

Open ravi-cldcvr opened 2 years ago

ravi-cldcvr commented 2 years ago

ArgoCD Application deployment is stuck in an infinite loop of Syncing the Application. Initially it works but after sometimes it get stuck in Syncing state. We have tried Terminating the sync but after terminating it got stuck in Terminating State.

Can Anyone Help to sort this out. What would be the possible issue!!

Screenshots

image

image

image

Version

2.1.7

Logs


time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/status': Unable to remove nonexistent key: status: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/status': Unable to remove nonexistent key: status: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/status': Unable to remove nonexistent key: status: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="patch: {\"status\":{\"reconciledAt\":\"2022-01-07T16:19:36Z\"}}" application=argocd
time="2022-01-07T16:19:36Z" level=info msg="Failed to Update application operation state: etcdserver: request is too large, retrying in 1s"
time="2022-01-07T16:19:36Z" level=info msg="Update successful" application=argocd
time="2022-01-07T16:19:36Z" level=info msg="Reconciliation completed" application=argocd dedup_ms=0 dest-name= dest-namespace=argocd dest-server="https://417901E660A9365B4057207C70C682EE.gr7.ap-south-1.eks.amazonaws.com" diff_ms=174 fields.level=2 git_ms=13 health_ms=3 live_ms=1 settings_ms=0 sync_ms=0 time_ms=249
time="2022-01-07T16:19:36Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=argocd```
utkarsh-devops commented 2 years ago

time="2022-01-07T18:08:00Z" level=warning msg="finished unary call with code FailedPrecondition" error="rpc error: code = FailedPrecondition desc = another operation is already in progress" grpc.code=FailedPrecondition grpc.method=Sync grpc.service=application.ApplicationService grpc.start_time="2022-01-07T18:08:00Z" grpc.time_ms=484.012 span.kind=server system=grpc

jgwest commented 2 years ago

@ravi-cldcvr can you provide more information on the Application you are attempting to synchronize? For example, can you provide the YAML definition for it? It appears that Argo CD is unable to update the status field of the CR, due to it being too large (for example, the Application has too many child resources).

gouravjoshicldcvr commented 2 years ago

@jgwest Thanks for your response. Yeah you are correct my application is little big which contains 35 helm charts and these charts create approximately more than 600 resources. what should we do in such case? if you can suggest it will be very helpful to us and this is a blocker for us.

gouravjoshicldcvr commented 2 years ago

I think every-time when argocd sync with git in the backend it creates a revision and append in same application yaml file. if it syncs for 5-6 times then it will create more revisions and same will be appended in application.yaml and then finally it reaches to the state where argocd unable to update yaml and app stuck in syncing state.

gouravjoshicldcvr commented 2 years ago

@jgwest Can you suggest something on this to make it work in our current production env bcz all our application got stuck in syncing state which is very painful.

jgwest commented 2 years ago

I'm not aware of any Argo CD settings that you can tweak here to reduce the size/verbosity of the status field (you could ask on Argo CD slack to see if anyone else has hit this or similar issue), my only suggestion would be to refactor your Argo CD application to use fewer resources.

gouravjoshicldcvr commented 2 years ago

@jgwest Thanks for your response. I can not remove the resources from app. Can you please tell me how much time argocd team will take to assign this issue to someone.

jgwest commented 2 years ago

@gouravjoshicldcvr Argo CD planning is not that formally organized, each bug and feature is evaluated by individual teams/companies/contributors for contributions, such I can't give you such an estimate.

gouravjoshicldcvr commented 2 years ago

@jgwest Can we restrict argocd to store only certain number of revision like 3 or 5, in this limit will never breach.

jessesuen commented 2 years ago

This issue was discussed in the contributing meeting today.

if it syncs for 5-6 times then it will create more revisions and same will be appended in application.yaml and then finally it reaches to the state where argocd unable to update yaml and app stuck in syncing state.

The history of syncs normally contributes very little to the size. However, it's possible it may get quite large if you are leveraging inlined helm values. Is that the case?

For example, here is example of items in history:

"history": [
{
"revision": "f373c3409ba9e17a44a01fef0e8ccfb267cb0ddb",
"deployedAt": "2021-11-02T10:29:43Z",
"id": 11,
"source": {
"repoURL": "https://github.com/xxx/yyy.git",
"path": "system/argo-cd",
"targetRevision": "HEAD"
},
"deployStartedAt": "2021-11-02T10:29:41Z"
},
{
"revision": "3f274138d93cc0d1579dad461b0bfb2edeb53924",
"deployedAt": "2021-11-05T18:03:33Z",
"id": 12,
"source": {
"repoURL": "https://github.com/xxx/yyy.git",
"path": "system/argo-cd",
"targetRevision": "HEAD"
},
"deployStartedAt": "2021-11-05T18:02:20Z"
}

@jgwest Can we restrict argocd to store only certain number of revision like 3 or 5, in this limit will never breach.

Yes, if history is contributing to your CR size, you can reduce this using:

kind: Application
spec:
  revisionHistoryLimit: 1

Outside of that, the other largest contributors to CRD size are:

However, there is no way to reduce those two in size

gouravjoshicldcvr commented 2 years ago
  • most @jgwest yes are providing values with --values-literal-file and as per my understanding it appends the values in application.yaml file only.
gouravjoshicldcvr commented 2 years ago

@jgwest by limiting the reversion this issue got solved. Thanks

andrewm-aero commented 2 years ago

In case anyone else ends up here with a similar issue, we ran into this, but we couldn't get the application controller to "terminate" the sync, even after setting the revision limit. The solution ended up being to delete the "status" field via a "kubectl edit" and set the status to an empty object.