argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18.01k stars 5.49k forks source link

Consistently getting 504 errors on GetResource API calls for Thanos application #20383

Open jonathan-firefly opened 1 month ago

jonathan-firefly commented 1 month ago

Checklist:

Describe the bug

Consistently getting 504 error when trying to perform a GetResource request on pvc (specifically for a Thanos application).

To Reproduce

  1. Create a Thanos application.
  2. Try to get the pvc resource manifest (either through the UI/CLI or a Go application).

Expected behavior

Receive app manifest.

Screenshots

When executing the same request using the Argo UI I get the same error code:

image

Version

argocd: v2.12.4+27d1e64
  BuildDate: 2024-09-26T09:31:42Z
  GitCommit: 27d1e641b6ea99d9f4bf788c032aeaeefd782910
  GitTreeState: clean
  GoVersion: go1.23.1
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.11.1+9f40df0
  BuildDate: 2024-05-21T13:55:56Z
  GitCommit: 9f40df0c29eca7e45a73f802f033dfd1ed0068e3
  GitTreeState: clean
  GoVersion: go1.21.9
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.2.1 2023-10-19T20:13:51Z
  Helm Version: v3.14.4+g81c902a
  Kubectl Version: v0.26.11
  Jsonnet Version: v0.20.0

Logs

jonathan@Jonathans-MacBook-Pro ~ % argocd app resources stag-infra-thanos --loglevel debug --grpc-web
FATA[0060] rpc error: code = Unknown desc = POST https://argocd.firefly.ai/application.ApplicationService/ResourceTree failed with status code 504 

Server side logs:

time="2024-10-16T08:08:58Z" level=error msg="finished unary call with code Unknown" error="error getting app resources: error getting cached app resource tree: error getting application by query: application refresh deadline exceeded" grpc.code=Unknown grpc.method=GetResource grpc.service=application.ApplicationService grpc.start_time="2024-10-16T08:07:58Z" grpc.time_ms=60000.562 span.kind=server system=grpc
agaudreault commented 1 month ago

504 errors are usually caused by your Ingress serving https://argocd.firefly.ai/ and we do not have a way to investigate these issues.

You can try to port-forward to the server pod directly so it does not go through your ingress and we rule that out.

However, "application refresh deadline exceeded" leads me to think that the controller might not be able to refresh your application, and the Get call times out because it wants to return an up-to-date manifest. You should also validate that the argo control plane (controller and repo-server) is healthy and running, and that Argo can refresh your app by clicking the refresh button in the UI.