argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
16.77k stars 5.08k forks source link

If metrics are enabled, `argocd` commands return `FATA[0000] cannot find pod with selector: [app.kubernetes.io/name=argocd-repo-server-metrics]` when using --core #18472

Open jessebot opened 1 month ago

jessebot commented 1 month ago

Checklist:

Describe the bug

All argocd CLI commands return:

FATA[0000] cannot find pod with selector: [app.kubernetes.io/name=argocd-repo-server-metrics] - 
use the --{component}-name flag in this command or set the environmental variable 
(Refer to https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables),
 to change the Argo CD component name in the CLI

To Reproduce

This used to work:

$ argocd app list --core
FATA[0000] cannot find pod with selector: [app.kubernetes.io/name=argocd-repo-server-metrics] - use the --{component}-name flag in this command or set the environmental variable (Refer to https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables), to change the Argo CD component name in the CLI

I also tried to logout and back in:

$ argocd logout kubernetes
Logged out from 'kubernetes'
$ argocd login --core
Context 'kubernetes' updated
$ argocd app list
FATA[0000] cannot find pod with selector: [app.kubernetes.io/name=argocd-repo-server-metrics] - use the --{component}-name flag in this command or set the environmental variable (Refer to https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables), to change the Argo CD component name in the CLI

Here's my argo config at /home/myuser/.config/argocd/config:

contexts:
- name: kubernetes
  server: kubernetes
  user: kubernetes
current-context: kubernetes
servers:
- core: true
  grpc-web-root-path: ""
  server: kubernetes
users:
- name: kubernetes

Expected behavior

I expected a server version to printed when running argocd version.

Version

From argocd version:

argocd: v2.11.2+25f7504
  BuildDate: 2024-05-23T15:29:58Z
  GitCommit: 25f7504ecc198e7d7fdc055fdb83ae50eee5edd0
  GitTreeState: clean
  GoVersion: go1.22.3
  Compiler: gc
  Platform: linux/amd64
FATA[0000] cannot find pod with selector: [app.kubernetes.io/name=argocd-repo-server-metrics] - use the --{component}-name flag in this command or set the environmental variable (Refer to https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables), to change the Argo CD component name in the CLI

From brew info argocd:

==> argocd: stable 2.11.2 (bottled)
GitOps Continuous Delivery for Kubernetes
https://argoproj.github.io/cd
Installed
/home/linuxbrew/.linuxbrew/Cellar/argocd/2.11.2 (9 files, 158.0MB) *
  Poured from bottle using the formulae.brew.sh API on 2024-06-02 at 08:53:23
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/a/argocd.rb
License: Apache-2.0
==> Dependencies
Build: go ✔, node ✔, yarn ✘
==> Caveats
Bash completion has been installed to:
  /home/linuxbrew/.linuxbrew/etc/bash_completion.d
==> Analytics
install: 14,804 (30 days), 44,467 (90 days), 149,806 (365 days)
install-on-request: 14,785 (30 days), 44,420 (90 days), 149,643 (365 days)
build-error: 2 (30 days)

Argo CD is installed via the official helm chart (version 7.1.1). Here's the json from the dashboard:

{
    "Version": "v2.11.2+25f7504",
    "BuildDate": "2024-05-23T13:32:13Z",
    "GitCommit": "25f7504ecc198e7d7fdc055fdb83ae50eee5edd0",
    "GitTreeState": "clean",
    "GoVersion": "go1.21.9",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.2.1 2023-10-19T20:13:51Z",
    "HelmVersion": "v3.14.4+g81c902a",
    "KubectlVersion": "v0.26.11",
    "JsonnetVersion": "v0.20.0"
}

Here's the Argo CD ApplicationSet that manages Argo CD itself: https://github.com/small-hack/argocd-apps/blob/cd1df56d7c994024f677f83b7f1c63db7e85115d/argocd/app_of_apps/argocd_appset.yaml

Logs

Not sure which logs would be helpful, but let me know which you'd like me to get? I don't see any errors anywhere. I did turn on debug logging and checked argo-cd-application-controller-0, but I don't know if this is normal. The numbers bit is kinda weird...:

{"level":"debug","msg":"Generating Manifest for source {https://argoproj.github.io/argo-helm  7.1.1 \u0026ApplicationSourceHelm{ValueFiles:[],Parameters:[]HelmParameter{},ReleaseName:argo-cd,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:false,IgnoreMissingValueFiles:false,SkipCrds:false,ValuesObject:\u0026runtime.RawExtension{Raw:*[123 34 97 112 112 108 105 99 97 116 105 111 110 83 101 116 34 58 123 34 99 101 114 116 105 102 105 99 97 116 101 34 58 123 34 97 100 100 105 116 105 111 110 97 108 72 111 115 116 115 34 58 91 93 44 34 100 111 109 97 105 110 34 58 34 97 114 103 111 45 99 100 46 118... truncated for brevity ...  117 115 34 44 34 116 108 115 67 111 110 102 105 103 34 58 123 125 125 125 44 34 112 100 98 34 58 123 34 97 110 110 111 116 97 116 105 111 110 115 34 58 123 125 44 34 101 110 97 98 108 101 100 34 58 102 97 108 115 101 44 34 108 97 98 101 108 115 34 58 123 125 44 34 109 97 120 85 110 97 118 97 105 108 97 98 108 101 34 58 34 34 44 34 109 105 110 65 118 97 105 108 97 98 108 101 34 58 34 34 125 44 34 114 101 115 111 117 114 99 101 115 34 58 123 125 125 125],},} nil nil nil argo-cd } revision 7.1.1","time":"2024-06-02T07:18:59Z"}
{"application":"argocd/argo-cd","level":"info","msg":"Refreshing app status (controller refresh requested), level (0)","time":"2024-06-02T07:18:59Z"}
{"application":"argocd/argo-cd","level":"info","msg":"Update successful","time":"2024-06-02T07:18:59Z"}
{"application":"argocd/argo-cd","dest-name":"","dest-namespace":"argocd","dest-server":"https://kubernetes.default.svc","fields.level":0,"level":"info","msg":"Reconciliation completed","patch_ms":7,"setop_ms":0,"time":"2024-06-02T07:18:59Z","time_ms":11}
{"application":"argocd/argocd-helm","build_options_ms":0,"helm_ms":0,"level":"info","msg":"GetRepoObjs stats","plugins_ms":0,"repo_ms":0,"time":"2024-06-02T07:18:59Z","time_ms":19,"unmarshal_ms":19,"version_ms":0}
{"application":"argocd/argocd-helm","level":"debug","msg":"Retrieved live manifests","time":"2024-06-02T07:18:59Z"}
{"application":"argocd/argocd-helm","level":"debug","msg":"specChanged","time":"2024-06-02T07:18:59Z","useDiffCache":"false"}

Additional context

All of the pods look healthy :shrug:

$ kubectl get pods
NAME                                                 READY   STATUS      RESTARTS   AGE
argo-cd-redis-secret-init-xhlf8                      0/1     Completed   0          153m
argo-cd-notifications-controller-5bcb89dcc4-96698    1/1     Running     0          153m
argo-cd-applicationset-controller-6555fb4984-7d6lm   1/1     Running     0          153m
argo-cd-redis-766b8bc9d-mkczk                        1/1     Running     0          153m
argo-cd-application-controller-0                     1/1     Running     0          153m
argo-cd-repo-server-97cc8879d-4p27n                  1/1     Running     0          153m
argo-cd-server-6d44d48499-49888                      1/1     Running     0          153m

I'm not actually sure when this broke? I haven't used my server in probably a few weeks. Maybe when I upgraded to helm chart version 7.x? You can see the updates here: https://github.com/small-hack/argocd-apps/commits/main/argocd?author=renovate%5Bbot%5D

(Also, thank you for any help you can provide, and the work everyone here in the org and wider community do here :pray: )

jessebot commented 1 month ago

tried updating to use servermonitors to see if that helps: https://github.com/small-hack/argocd-apps/commit/fe8434172a1cca99d10e57ea3c8e42ac2bb50ad8

It doesn't make a difference. The weird thing is there is no metrics pod. Should there be? The only thing listed in the metrics.yaml in the helm chart is a service: https://github.com/argoproj/argo-helm/blob/main/charts/argo-cd/templates/argocd-repo-server/metrics.yaml

And the service seems ok? here it is:

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"repo-server","app.kubernetes.io/instance":"argo-cd","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"argocd-repo-server-metrics","app.kubernetes.io/part-of":"argocd","app.kubernetes.io/version":"v2.11.2","argocd.argoproj.io/instance":"argocd-helm","helm.sh/chart":"argo-cd-7.1.1"},"name":"argo-cd-repo-server-metrics","namespace":"argocd"},"spec":{"ports":[{"name":"http-metrics","port":8084,"protocol":"TCP","targetPort":"metrics"}],"selector":{"app.kubernetes.io/instance":"argo-cd","app.kubernetes.io/name":"argocd-repo-server"},"type":"ClusterIP"}}
  creationTimestamp: "2024-04-12T15:14:40Z"
  labels:
    app.kubernetes.io/component: repo-server
    app.kubernetes.io/instance: argo-cd
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: argocd-repo-server-metrics
    app.kubernetes.io/part-of: argocd
    app.kubernetes.io/version: v2.11.2
    argocd.argoproj.io/instance: argocd-helm
    helm.sh/chart: argo-cd-7.1.1
  name: argo-cd-repo-server-metrics
  namespace: argocd
  resourceVersion: "28300640"
  uid: baa19b08-ea6f-48f3-a41e-29e1456e5479
spec:
  clusterIP: 10.43.93.2
  clusterIPs:
  - 10.43.93.2
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-metrics
    port: 8084
    protocol: TCP
    targetPort: metrics
  selector:
    app.kubernetes.io/instance: argo-cd
    app.kubernetes.io/name: argocd-repo-server
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

I'm kinda stumped, because if you do argocd app list --help there is no --repo-server-metrics-name. The environment variable link that we're instructed to check also doesn't have anything for the repo-server-metrics service: https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables/

Perhaps this is related to https://github.com/argoproj/argo-cd/issues/10200 ? It was fixed with https://github.com/argoproj/argo-cd/pull/14605/files however that fix doesn't include anything related to metrics.

I asked to be sure that there's not supposed to be a metrics pod here: https://github.com/argoproj/argo-helm/discussions/2738

jessebot commented 1 month ago

Disabling all metrics is a work around to get the cli working again, https://github.com/small-hack/argocd-apps/commit/cfca63174ffd05b49f3eb8d76a58e17977d88715