argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.92k stars 5.46k forks source link

Application stuck in refreshing #20785

Open ivan-cai opened 22 hours ago

ivan-cai commented 22 hours ago

Describe the bug I have 3000 Applications, sometimes, Application is stucking in refreshing, only restart application-controller or repo-server can solve it. About 2~3 times per day. Application is triggered to sync by gitlab webhook. I have :

I have got the repo-server goroutine profile, and like this image

My ArgoCD version is 2.12.4

my some config

  controller.operation.processors: "100"
  controller.repo.server.timeout.seconds: "120"
  controller.status.processors: "300"
  reposerver.git.attempts.count: "5"
  reposerver.parallelism.limit: "30"
  server.grpc.max.size.mb: "200"
  server.k8sclient.retry.base.backoff: "200"
  server.webhook.parallelism.limit: "50"`

**Expected behavior**

Application not stuck in refreshing

**Version**
v2.12.4 tag and with this commit https://github.com/argoproj/argo-cd/commit/95be90b5f9f5acebca46e3dcc3df9355307f6285

```shell
Paste the output from `argocd version` here.

Logs Application Controller is Comparing app state, and can not get generated manifests from repo-server

Paste any relevant application logs here.
ivan-cai commented 22 hours ago

/assign @alexmt @crenshaw-dev

andrii-korotkov-verkada commented 20 hours ago

Try to upgrade to 2.13, there's been major performance improvements to refresh times, which in my case reduced refresh times for some applications from 30-60 min on medium cluster to < 1 min.

andrii-korotkov-verkada commented 7 hours ago

Please, let us know the results in 2.13.

ivan-cai commented 6 hours ago

I have found why repo-server is hang, the reason is git fetch is hang, and this goroutine hold the mutex.

goroutine 10514931 [chan receive, 17 minutes]:
github.com/argoproj/pkg/exec.RunCommandExt(0xc00170e420, {0x14f46b0400, 0x0, {0xf, 0x1}, 0x0, 0x0})
        /go/pkg/mod/github.com/argoproj/pkg@v0.13.7-0.20230626144333-d56162821bd1/exec/exec.go:139 +0xd5d
github.com/argoproj/argo-cd/v2/util/exec.RunWithExecRunOpts(0xc00170e420, {0x0?, {0x0?, 0x0?}, 0x40?, 0x1b?})
        /go/src/github.com/argoproj/argo-cd/util/exec/exec.go:59 +0x7d5
github.com/argoproj/argo-cd/v2/util/git.(*nativeGitClient).runCmdOutput(0xc000897ab0, 0xc00170e420, {0x8?, 0x8c?})
        /go/src/github.com/argoproj/argo-cd/util/git/client.go:887 +0x5f5
github.com/argoproj/argo-cd/v2/util/git.(*nativeGitClient).runCredentialedCmd(0xc000897ab0, {0xc001dd8c08, 0x5, 0x5})
        /go/src/github.com/argoproj/argo-cd/util/git/client.go:843 +0x413
github.com/argoproj/argo-cd/v2/util/git.(*nativeGitClient).fetch(0xc000cc6340?, {0x0?, 0x0?})
        /go/src/github.com/argoproj/argo-cd/util/git/client.go:356 +0x192
github.com/argoproj/argo-cd/v2/util/git.(*nativeGitClient).Fetch(0xc000897ab0, {0x0?, 0x0?})
        /go/src/github.com/argoproj/argo-cd/util/git/client.go:383 +0x9b
github.com/argoproj/argo-cd/v2/reposerver/repository.checkoutRevision({0x55af400, 0xc000897ab0}, {0xc001629d40, 0x28}, 0x1)
        /go/src/github.com/argoproj/argo-cd/reposerver/repository/repository.go:2440 +0x222
github.com/argoproj/argo-cd/v2/reposerver/repository.(*Service).checkoutRevision(0xc000b70b40, {0x55af400, 0xc000897ab0}, {0xc001629d40, 0x28}, 0x1)
        /go/src/github.com/argoproj/argo-cd/reposerver/repository/repository.go:2418 +0x75
github.com/argoproj/argo-cd/v2/reposerver/repository.(*Service).GetGitDirectories.func1()
        /go/src/github.com/argoproj/argo-cd/reposerver/repository/repository.go:2670 +0x3d
github.com/argoproj/argo-cd/v2/reposerver/repository.(*repositoryLock).Lock(0xc001045060, {0xc001336300, 0x36}, {0xc001629d40, 0x28}, 0x1, 0xc001dd9208)
        /go/src/github.com/argoproj/argo-cd/reposerver/repository/lock.go:55 +0x2e5
github.com/argoproj/argo-cd/v2/reposerver/repository.(*Service).GetGitDirectories(0xc000b70b40, {0x3612ec0?, 0x554f420?}, 0xc000b2d770)
        /go/src/github.com/argoproj/argo-cd/reposerver/repository/repository.go:2669 +0x40f
github.com/argoproj/argo-cd/v2/reposerver/apiclient._RepoServerService_GetGitDirectories_Handler.func1({0x558f168?, 0xc002381a40?}, {0x3dc1120?, 0xc000b2d770?})
        /go/src/github.com/argoproj/argo-cd/reposerver/apiclient/repository.pb.go:3085 +0xcb
github.com/argoproj/argo-cd/v2/reposerver.NewServer.ErrorSanitizerUnaryServerInterceptor.func3({0x558f168, 0xc002381a10}, {0x3dc1120, 0xc000b2d770}, 0x0?, 0xc0022f76e0)
        /go/src/github.com/argoproj/argo-cd/util/grpc/sanitizer.go:24 +0x71
github.com/argoproj/argo-cd/v2/reposerver.NewServer.ChainUnaryServer.func5.1({0x558f168?, 0xc002381a10?}, {0x3dc1120?, 0xc000b2d770?})
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.4.0/chain.go:48 +0x45
github.com/argoproj/argo-cd/v2/reposerver.NewServer.PanicLoggerUnaryServerInterceptor.func2({0x558f168?, 0xc002381a10?}, {0x3dc1120?, 0xc000b2d770?}, 0x4009594?, 0x11?)
        /go/src/github.com/argoproj/argo-cd/util/grpc/grpc.go:33 +0x8c
github.com/argoproj/argo-cd/v2/reposerver.NewServer.ChainUnaryServer.func5.1({0x558f168?, 0xc002381a10?}, {0x3dc1120?, 0xc000b2d770?})
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.4.0/chain.go:48 +0x45
github.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).UnaryServerInterceptor.func2({0x558f168, 0xc002381a10}, {0x3dc1120, 0xc000b2d770}, 0x0?, 0xc0023a0040)
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107 +0x7d
github.com/argoproj/argo-cd/v2/reposerver.NewServer.ChainUnaryServer.func5.1({0x558f168?, 0xc002381a10?}, {0x3dc1120?, 0xc000b2d770?})
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.4.0/chain.go:48 +0x45
github.com/grpc-ecosystem/go-grpc-middleware/logging/logrus.UnaryServerInterceptor.func1({0x558f168, 0xc002381920}, {0x3dc1120, 0xc000b2d770}, 0xc001c5e060, 0xc0023a0080)
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.4.0/logging/logrus/server_interceptors.go:31 +0xfe
github.com/argoproj/argo-cd/v2/reposerver.NewServer.ChainUnaryServer.func5.1({0x558f168?, 0xc002381920?}, {0x3dc1120?, 0xc000b2d770?})
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.4.0/chain.go:48 +0x45
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1({0x558f168, 0xc002381860}, {0x3dc1120, 0xc000b2d770}, 0xc001c5e060, 0xc0023a00c0)
        /go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.46.1/interceptor.go:326 +0x5a4
github.com/argoproj/argo-cd/v2/reposerver.NewServer.ChainUnaryServer.func5({0x558f168, 0xc002381860}, {0x3dc1120, 0xc000b2d770}, 0xc001c5e060, 0x78?)
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.4.0/chain.go:53 +0x123
github.com/argoproj/argo-cd/v2/reposerver/apiclient._RepoServerService_GetGitDirectories_Handler({0x3e19820, 0xc000b70b40}, {0x558f168, 0xc002381860}, 0xc00126b280, 0xc001297140)
        /go/src/github.com/argoproj/argo-cd/reposerver/apiclient/repository.pb.go:3087 +0x143
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003bc5a0, {0x558f168, 0xc0023817a0}, {0x55a2260, 0xc000d83a00}, 0xc0005e2a20, 0xc001297440, 0x79837e8, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343 +0xdd1
google.golang.org/grpc.(*Server).handleStream(0xc0003bc5a0, {0x55a2260, 0xc000d83a00}, 0xc0005e2a20)
        /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737 +0xc47
google.golang.org/grpc.(*Server).serveStreams.func1.1()
        /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986 +0x86
created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 10514930
        /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:997 +0x136
ivan-cai commented 6 hours ago

Please, let us know the results in 2.13.

Thanks, I will have a try 2.13.

andrii-korotkov-verkada commented 5 hours ago

There should be an exec timeout after which is should terminate. Although sometimes waiting several minutes for a git fetch is unavoidable, though it should rarely happen.

ivan-cai commented 5 hours ago

There should be an exec timeout after which is should terminate. Although sometimes waiting several minutes for a git fetch is unavoidable, though it should rarely happen.

I agree with you. A parameter should be exposed for clients to configure.

andrii-korotkov-verkada commented 5 hours ago

You can configure it using env variable on the repo server manifest, e.g.

          env:
            - name: ARGOCD_EXEC_TIMEOUT
              value: "5m"

The default is 1m30s.