DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.9k stars 1.21k forks source link

Custom metrics provider should convert metrics / tags names into Kubernetes format. #2548

Open renaudguerin opened 6 years ago

renaudguerin commented 6 years ago

Hello,

I'm using the Datadog Helm chart (datadog-1.5.1) with agent 6.5.2 and cluster-agent 0.10.0 on GKE (k8s 1.10.7) I've opened a support ticket about this, but since this is clearly a missing feature / bug in the custom metrics provider, I thought it would be worth opening an issue too.

I’m trying to use the Cluster Agent’s external metrics provider for Kubernetes autoscaling, following this doc : https://github.com/DataDog/datadog-agent/blob/master/docs/cluster-agent/CUSTOM_METRICS_SERVER.md

The use case is to autoscale a workers deployment based on the number of jobs in a Redis-based queue (we use the popular Bull package for NodeJS as our queue https://github.com/OptimalBits/bull ).

We've set up the redisdb check in the agent to monitor the length of a key named bull:tasks:wait (this is hardcoded in the Bull library). This ends up as a Datadog metric named redis.key.length with a bull:tasks:wait tag.

The doc says the metric/labels provided in the Kubernetes HPA manifest should be in Datadog format. So I did this :

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: bullworker
  namespace: production
spec:
  maxReplicas: 100
  minReplicas: 3
  metrics:
  - type: External
    external:
      metricName: redis.key.length
      metricSelector:
        matchLabels:
            redis_host: redis
            key: "bull:tasks:wait"
            cluster-name: production
      targetAverageValue: 5
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 100
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: bullworker

Unfortunately, the label/tag I need to filter on contains a colon “:”, and it seems like this isn’t allowed by Kubernetes :

Warning FailedGetExternalMetric 32s (x4 over 2m) horizontal-pod-autoscaler invalid label value: "bull:tasks:wait": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'myvalue', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9.])?[A-Za-z0-9])?') Warning FailedComputeMetricsReplicas 32s (x4 over 2m) horizontal-pod-autoscaler failed to get redis.key.length external metric: invalid label value: "bull:tasks:wait": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'myvalue', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9.])?[A-Za-z0-9])?')

One possibility would be to remove illegal characters at the Redis check level. But it seems to me the right place to do this conversion is actually in the Custom Metrics provider, because this format requirement is specific to Kubernetes.

Can you help ? Thanks !

CharlyF commented 6 years ago

Hello @renaudguerin ! Thanks for opening this issue. Indeed, this currently a limitation that we have. I have added a task in our backlog to investigate how to best solve this problem. I can't give you a timeline at the moment but we will get back to you as soon as possible.

Best,

marchelbling commented 4 years ago

Any update on this issue?

CharlyF commented 4 years ago

No update on this, this falls in the bucket of supporting custom queries. @marchelbling: Do you have the same use case (tags that contain forbidden characters) ?

andrewreineke commented 4 years ago

I've also just come across this. Our use case is environment specification. For example, in metric explorer we filter over values like:

acastro2 commented 8 months ago

Came across this for kafka consumer groups too.

plumdog commented 4 months ago

@CharlyF is there a known workaround for this issue? Or is it just not possible to use a metric for scaling if we need to filter using a label that contains characters Kubernetes doesn't allow in a label? Either a workaround Kubernetes-side or in the custom metrics provider or in the metrics section of Datadog itself?

I am also using the redis integration and also have a key that contains :. I can't see a way of asking the redis integration to munge these keys (though this is clearly not an ideal solution anyway, so I don't really expect it to) but rather that the Datadog custom metrics provider ought to have a way of translating to any legal Datadog metric tag from the labels allowed by Kubernetes.

The Kubernetes requirement is:

a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'myvalue', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9])?')

So, for example, something like "base64 encode it, then replace = with _ and add some bookends" would be a bit weird but I think would work, then:

selector:
  matchLabels:
    key: b64_bXk6dmFsdWU__b64

(tbh I hope there are much neater ways of handling this that I haven't thought of)

The Datadog custom metrics provider could then undo this encoding (remove bookends to get bXk6dmFsdWU_, replace _ with = to get bXk6dmFsdWU=, then base64 decode) to work out that the value for key should be my:value and use that for querying Datadog.

Edit: maybe URL encoding and replace %, maybe hexadecimal?

plumdog commented 4 months ago

Or, best of all, would be if this issue were to get fixed in Kubernetes https://github.com/kubernetes/kubernetes/issues/102056, which would sidestep all of the above.