DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
339 stars 1.01k forks source link

datadog metrics issue when used in hpa #815

Open bellondr opened 1 year ago

bellondr commented 1 year ago

Describe what happened: In aws eks env, when we use datadog metrics in hpa, we found error:

cmd: kubectl describe HorizontalPodAutoscaler car-service

Warning FailedGetExternalMetric 4m31s (x10889 over 2d5h) horizontal-pod-autoscaler unable to get external metric car-service/datadogmetric@car-service:network-requests/nil: unable to fetch metrics from external metrics API: Internal error occurred: DatadogMetric is invalid, err: Invalid metric (from backend), query: sum:aws.applicationelb.request_count_per_target{service:car-service,stage:stage}.as_count().rollup(sum, 60)

And in cluster agent we found error log: 022-11-17 06:34:37 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/provider.go:106 in GetExternalMetric) | ExternalMetric query failed with error: DatadogMetric is invalid, err: Invalid metric (from backend), query: sum:aws.applicationelb.request_count_per_target{service:car-service,stage:stage}.as_count().rollup(sum, 60)

in datadog metrics: kubectl describe datadogmetric network-requests

  Max Age:  900s
  Query:    sum:aws.applicationelb.request_count_per_target{service:car-service,stage:stage}.as_count().rollup(sum, 60)
Status:
  Autoscaler References:  hpa:car-service/car-service
  Conditions:
    Last Transition Time:  2022-11-15T00:59:04Z
    Last Update Time:      2022-11-17T06:37:21Z
    Status:                True
    Type:                  Active
    Last Transition Time:  2022-11-15T00:59:04Z
    Last Update Time:      2022-11-17T06:37:21Z
    Status:                False
    Type:                  Valid
    Last Transition Time:  2022-11-15T00:59:04Z
    Last Update Time:      2022-11-17T06:37:21Z
    Status:                True
    Type:                  Updated
    Last Transition Time:  2022-11-15T00:59:35Z
    Last Update Time:      2022-11-17T06:37:21Z
    Message:               Invalid metric (from backend), query: sum:aws.applicationelb.request_count_per_target{service:car-service,stage:stage}.as_count().rollup(sum, 60)
    Reason:                Unable to fetch data from Datadog
    Status:                True
    Type:                  Error
  Current Value:           0
Events:                    <none>

But the hpa works.

Describe what you expected: There is no error log, and datadogmetric status is right

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc):

clamoriniere commented 1 year ago

Hi @bellondr

This issue doesn't seem to be linked to the datadog chart, but more linked to the cluster-agent component that is part of the https://github.com/datadog/datadog-agent repository.

It would be better that you reopen the issue on the https://github.com/datadog/datadog-agent repository and try to contact our support with a cluster-agent flare

bellondr commented 1 year ago

@clamoriniere Thanks for your reply Maybe it it cluster-agent issue. But If we set the env in deployment.yaml. we can fix it. So maybe helm chart need to expose these two env in values.yaml

clamoriniere commented 1 year ago

Hello @bellondr

Sorry if I didn't understood your initial message.

njay4928 commented 1 year ago

Hi, I'm facing the same issue. @bellondr were you able to fix it?

bellondr commented 5 months ago

@njay4928 Sorry to reply too late set

 - name: DD_EXTERNAL_METRICS_PROVIDER_MAX_AGE
    value: "900"
 - name: DD_EXTERNAL_METRICS_PROVIDER_BUCKET_SIZE
    value: "900"

in deployment datadog-cluster-agent env