argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.39k stars 5.28k forks source link

SideCar Plugin throws error; client stream context error: context canceled #16012

Open toyamagu-2021 opened 11 months ago

toyamagu-2021 commented 11 months ago

Checklist:

Describe the bug

Hi. I'm using ArgoCD v2.6.15 and migrated to SideCar CMP, experiencing performance issue of SideCar CMP. We have following error message in argocd-server when argocd app diff

failed to execute argocd diff. Exit code: 20. stdout: \"time=\\\"2023-10-18T10:20:49Z\\\" level=fatal msg=\\\"rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \\\\\\\"error reading from server: EOF\\\\\\\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \\\\\\\"too_many_pings\\\\\\\"\\\"\\n\", stderr: \"exit status 20\""}

After retry argocd app diff some times, finally we got

argo-cd-argocd-server-66fbc9b975-wp6zq server {"error":"rpc error: code = Unknown desc = Manifest generation error (cached): plugin sidecar failed. error generating manifests in cmp: error sending file to cmp-server: error sending tgz file to cmp-server: client stream context error: context canceled","grpc.code":"Unknown","grpc.method":"GetManifests","grpc.service":"application.ApplicationService","grpc.start_time":"2023-10-18T08:58:05Z","grpc.time_ms":601.657,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-10-18T08:58:06Z"}

Our Application Status:

image

To Reproduce

We have ~716 resources under one Application, which might cause this issue.

image

We have tried following:

  1. Set --repo-server-timeout-seconds=600 both on argocd-server and argocd-application-controller
  2. Exclude unnecessary file ".git/*"
    • We are using mono-repo which total size is 247M, and .git folder size is 185M.

Can we get more better performance by tuning?

Expected behavior

Generate manifest by SideCar plugin

Screenshots

Version

argocd version
argocd: v2.8.4+c279299.dirty
  BuildDate: 2023-09-13T20:25:25Z
  GitCommit: c27929928104dc37b937764baf65f38b78930e59
  GitTreeState: dirty
  GoVersion: go1.21.1
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.6.15+2f7922b

Logs

grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T09:09:28Z" grpc.time_ms=17897.016 span.kind=server system=grpc
argo-cd-argocd-repo-server-8685c864dd-4qh8c helmfile time="2023-10-18T09:15:58Z" level=error msg="finished streaming call with code Unknown" error="generate manifest error receiving stream: error receiving tgz file: stream context error: context canceled" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T09:15:56Z" grpc.time_ms=2238.227 span.kind=server system=grpc
toyamagu-2021 commented 11 months ago

Additional info:

          generate:
            command: [bash, -c]
            args: |
              helmfile template --include-crds --skip-tests -f helmfile.yaml

Any suggestions will be appreciated.

crenshaw-dev commented 11 months ago

Can you provide more complete sidecar logs, ideally with timestamps? I think there are two options: git transfer taking too long or helmfile taking too long.

toyamagu-2021 commented 11 months ago

I think this might relates to https://github.com/argoproj/argo-cd/pull/15806? NOTE: ARGOCD_EXEC_TIMEOUT=10m

2023-10-18T19:23:13.987882665Z time="2023-10-18T19:23:13Z" level=info msg="ArgoCD ConfigManagementPlugin Server is starting" built="2023-09-07T17:52:30Z" commit=2f7922be9c8f364fec435eec4860b49279be77da version=v2.6.15+2f7922b
2023-10-18T19:23:13.988468946Z time="2023-10-18T19:23:13Z" level=info msg="argocd-cmp-server v2.6.15+2f7922b serving on /home/argocd/cmp-server/plugins/helmfile.sock"
2023-10-18T19:32:48.519698405Z time="2023-10-18T19:32:48Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=MatchRepository grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T19:32:20Z" grpc.time_ms=27702.465 span.kind=server system=grpc
2023-10-18T19:32:48.521963964Z time="2023-10-18T19:32:48Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=MatchRepository grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T19:32:20Z" grpc.time_ms=27689.312 span.kind=server system=grpc
2023-10-18T19:32:56.912033444Z time="2023-10-18T19:32:56Z" level=info msg="Generating manifests with no request-level timeout"
2023-10-18T19:32:56.912176574Z time="2023-10-18T19:32:56Z" level=info msg="<CMP INIT>"" dir=/tmp/_cmp_server/0908e875-ae3a-45cc-8364-d9efb7cb33bd/<MY_GIT_REPO> execID=e461b
2023-10-18T19:32:56.938809700Z time="2023-10-18T19:32:56Z" level=debug duration=26.539585ms execID=e461b
2023-10-18T19:32:56.938892049Z time="2023-10-18T19:32:56Z" level=info msg="<CMP GEN>" dir=/tmp/_cmp_server/0908e875-ae3a-45cc-8364-d9efb7cb33bd/<MY_GIT_REPO> execID=881be
2023-10-18T19:32:57.256927012Z time="2023-10-18T19:32:57Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=GetParametersAnnouncement grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T19:32:48Z" grpc.time_ms=8736.639 span.kind=server system=grpc
2023-10-18T19:33:13.988750914Z time="2023-10-18T19:33:13Z" level=info msg="Alloc=18402 TotalAlloc=1979284 Sys=46189 NumGC=176 Goroutines=16"
2023-10-18T19:33:47.547344899Z time="2023-10-18T19:33:47Z" level=debug msg="<GENERATED MANIFEST>" duration=50.564412317s execID=881be
2023-10-18T19:33:47.552843084Z time="2023-10-18T19:33:47Z" level=error msg="<CMP GEN> failed signal: killed: <HELM REPO ADD>" execID=881be
2023-10-18T19:33:48.007175582Z time="2023-10-18T19:33:48Z" level=error msg="finished streaming call with code Unknown" error="error generating manifests: <CMP GEN> failed signal: killed: <HELM REPO ADD>" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T19:32:48Z" grpc.time_ms=59484.652 span.kind=server system=grpc

cli command:

$ date -u && time argocd app diff <APP> --revision <BRANCH>
Wed Oct 18 20:27:26 UTC 2023
2023/10/19 05:28:36 ERROR: [transport] Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII "too_many_pings".
FATA[0069] rpc error: code = Unavailable desc = closing transport due to: connection error: desc = "error reading from server: EOF", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: "too_many_pings" 
argocd app diff <APP> --revision <BRANCH>  0.22s user 0.46s system 0% cpu 1:09.40 total
$ date -u && time argocd app diff <APP> --revision <BRANCH>
Wed Oct 18 20:31:25 UTC 2023
2023/10/19 05:32:35 ERROR: [transport] Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII "too_many_pings".
FATA[0069] rpc error: code = Unavailable desc = closing transport due to: connection error: desc = "error reading from server: EOF", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: "too_many_pings" 
argocd app diff <APP> --revision <BRANCH>  0.24s user 0.48s system 1% cpu 1:09.68 total
$ date -u && time argocd app diff <APP> --revision <BRANCH>
Wed Oct 18 20:33:51 UTC 2023
2023/10/19 05:35:00 ERROR: [transport] Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII "too_many_pings".
FATA[0068] rpc error: code = Unavailable desc = closing transport due to: connection error: desc = "error reading from server: EOF", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: "too_many_pings" 
argocd app diff <APP> --revision <BRANCH>  0.20s user 0.44s system 0% cpu 1:08.73 total
$ date -u && time argocd app diff <APP> --revision <BRANCH>
Wed Oct 18 20:35:54 UTC 2023
FATA[0011] rpc error: code = Unknown desc = Manifest generation error (cached): plugin sidecar failed. error generating manifests in cmp: rpc error: code = Canceled desc = context canceled 
argocd app diff <APP> --revision <BRANCH>  0.25s user 0.53s system 6% cpu 11.915 total

cmp-server

time="2023-10-18T20:07:27Z" level=info msg="ArgoCD ConfigManagementPlugin Server is starting" built="2023-09-07T17:52:30Z" commit=2f7922be9c8f364fec435eec4860b49279be77da version=v2.6.15+2f7922b
time="2023-10-18T20:07:27Z" level=info msg="argocd-cmp-server v2.6.15+2f7922b serving on /home/argocd/cmp-server/plugins/helmfile.sock"
time="2023-10-18T20:17:27Z" level=info msg="Alloc=7717 TotalAlloc=16345 Sys=29037 NumGC=7 Goroutines=7"
time="2023-10-18T20:25:36Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=MatchRepository grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T20:25:12Z" grpc.time_ms=24045.389 span.kind=server system=grpc
time="2023-10-18T20:25:40Z" level=info msg="Generating manifests with no request-level timeout"
time="2023-10-18T20:25:40Z" level=info msg="<CMP_INIT>" dir=/tmp/_cmp_server/280e6cbf-9754-4b0c-ac82-5def36dc1387/<REPO> execID=2c0e7
time="2023-10-18T20:25:40Z" level=debug duration=24.904048ms execID=2c0e7
time="2023-10-18T20:25:40Z" level=info msg="<CMP_GEN>" dir=/tmp/_cmp_server/280e6cbf-9754-4b0c-ac82-5def36dc1387/<REPO> execID=63de1
time="2023-10-18T20:25:44Z" level=debug duration=4.664950709s execID=63de1
time="2023-10-18T20:25:44Z" level=error msg="`<CMP_GEN> failed signal: killed" execID=63de1
time="2023-10-18T20:25:45Z" level=error msg="finished streaming call with code Unknown" error="error generating manifests: <CMP_GEN> failed signal: killed" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T20:25:36Z" grpc.time_ms=8921.55 span.kind=server system=grpc
time="2023-10-18T20:27:27Z" level=info msg="Alloc=13310 TotalAlloc=998446 Sys=42093 NumGC=98 Goroutines=7"
time="2023-10-18T20:27:45Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=MatchRepository grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T20:27:41Z" grpc.time_ms=4628.62 span.kind=server system=grpc
time="2023-10-18T20:27:49Z" level=info msg="Generating manifests with no request-level timeout"
time="2023-10-18T20:27:49Z" level=info msg="<CMP_INIT>" dir=/tmp/_cmp_server/fe96f095-d4be-4fea-909d-094dcd8a0d83/<REPO> execID=b00ed
time="2023-10-18T20:27:49Z" level=debug duration=6.251293ms execID=b00ed
time="2023-10-18T20:27:49Z" level=info msg="<CMP_GEN>" dir=/tmp/_cmp_server/fe96f095-d4be-4fea-909d-094dcd8a0d83/<REPO> execID=e5a12
time="2023-10-18T20:28:36Z" level=debug msg="<GEN_MANI>" duration=47.054633735s execID=e5a12
time="2023-10-18T20:28:36Z" level=error msg="<CMP_GEN> failed signal: terminated: Adding repo <HELM_REPO>" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T20:27:45Z" grpc.time_ms=51457.832 span.kind=server system=grpc
time="2023-10-18T20:34:08Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=MatchRepository grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T20:34:03Z" grpc.time_ms=4604.318 span.kind=server system=grpc
time="2023-10-18T20:34:12Z" level=info msg="Generating manifests with no request-level timeout"
time="2023-10-18T20:34:12Z" level=info msg="<CMP_INI>" dir=/tmp/_cmp_server/4865f27b-464b-431d-84b7-b57cc34ac632/<REPO> execID=1da8c
time="2023-10-18T20:34:12Z" level=debug duration=3.541361ms execID=1da8c
time="2023-10-18T20:34:12Z" level=info msg="<CMP_GEN>" dir=/tmp/_cmp_server/4865f27b-464b-431d-84b7-b57cc34ac632/<REPO> execID=1fb4e
time="2023-10-18T20:35:01Z" level=debug msg="<GENERATED_MANIFEST>" duration=48.507043932s execID=1fb4e
time="2023-10-18T20:35:01Z" level=error msg="error generating manifests: <CMP_GEN> failed signal: terminated: Adding repo <HELM_REPO>" execID=1fb4e
time="2023-10-18T20:35:01Z" level=error msg="finished streaming call with code Unknown" error="error generating manifests: <CMP_GEN> failed signal: terminated: <HELM_REPO>" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-10-18T20:34:08Z" grpc.time_ms=53340.32 span.kind=server system=grpc