DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
909 stars 1.39k forks source link

Argo CD check failing to collect certain argocd.appset_controller metrics #17968

Open brandon-berg opened 2 months ago

brandon-berg commented 2 months ago

Steps to reproduce the issue:

  1. Configure Argo CD integration to collect metrics from Argo CD Application Set Controller

Describe the results you received: These metrics are not collected:

Describe the results you expected: These metrics are not collected, although these are collected, as expected:

Additional information you deem important (e.g. issue happens only occasionally): This is caused by the erroneous inclusion of the counter suffix _total in the list of metrics to be collected from the Argo CD ApplicationSet Controller here. As discussed in the documentation, the "_total" suffix must be removed when specifying the name of counter metrics to be collected. As a result, these metrics cannot be collected.

Workaround In the argo integration config, add the correct metric definitions as extra_metrics:

extra_metrics:
  - controller_runtime_reconcile_errors: "reconcile.errors"
  - controller_runtime_reconcile: "runtime.reconcile"
ericblackburn commented 2 months ago

On our argo instance we were able to get the metrics by using a workaround to tell datadog the correct name, that this issue notes.

argo-cd:
  applicationSet:
    podAnnotations:
      ad.datadoghq.com/applicationset-controller.logs: '[{"service":"argocd","source":"argocd"}]'
      ad.datadoghq.com/applicationset-controller.checks: |
        {
          "argocd": {
            "init_config": {"service": "argocd"},
            "instances": [
              {
                "appset_controller_endpoint": "http://%%host%%:8080/metrics",
                "extra_metrics": [
                   {"controller_runtime_reconcile_errors": "reconcile.errors"},
                   {"controller_runtime_reconcile": "runtime.reconcile"}
                ]
              }
            ]
          }
        }
ericblackburn commented 2 months ago

@brandon-berg , were you wanting to make a PR for this or should I?

brandon-berg commented 2 months ago

I'm not 100% sure what the actual intended behavior was, so I'd like to leave it up to Datadog, or at least hear from them about what they actually want before submitting a PR.

ericblackburn commented 2 months ago

Note this is related to https://github.com/DataDog/integrations-core/pull/15308. Left a comment on the original PR issue.

steveny91 commented 1 month ago

Hello 👋 Thanks for flagging! I'll put up a PR to fix this. In the meantime, although inconvenient, your proposed work arounds is what I would have recommended. Apologies there! 🙇