DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.86k stars 1.2k forks source link

dd.internal.entity_id isn't respected for UDS metrics #3870

Open danopia opened 5 years ago

danopia commented 5 years ago

Describe what happened:

I am using a Kubernetes agent with UDS (Unix sockets). I have C# applications, where the C# dogstatsd library lacks UDS support. I added a socat sidecar container to my application pods, and pointed the application to 127.0.0.1:8125.

The metrics make it through to Datadog. However, they are tagged with the socat container's tags (for example image_name:alpine/socat). I configured DD_ENTITY_ID which had no effect on the socat tags. It seems the UDS origin tags merge with (and possibly overwrite) DD_ENTITY_ID origin tags.

Describe what you expected:

I expected the entity_id to override the UDS origin information if it was provided. I did not expect to see my application metrics origin-tagged as the socat proxy.

I also expected entity_id to provide container-level granularity (image version would be quite nice) though that may deserve a separate ticket.

Steps to reproduce the issue:

Deploy a UDS kubernetes agent. Configure a UDP-dogstatsd pod with a socat proxy, and also an entity ID:

        - name: app
          args:
            - dotnet
            - MyApp.dll
          env:
            - name: ASPNETCORE_ENVIRONMENT
              value: Staging
            - name: DD_AGENT_HOST
              value: 127.0.0.1
            - name: DD_ENTITY_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.uid
          image: ....
          ......
        - name: udppipe
          args:
            - '-s'
            - '-u'
            - 'UDP-RECV:8125'
            - 'UNIX-SENDTO:/var/run/datadog/dsd.socket'
          image: alpine/socat
          volumeMounts:
            - mountPath: /var/run/datadog
              name: dsdsocket
              readOnly: true

Workaround

I added a different socat proxy container to the actual datadog agents:

      - args:
        - '-s'
        - '-u'
        - 'UNIX-RECV:/var/run/datadog/dsd-anon.socket'
        - 'UDP-SENDTO:127.0.0.1:8125'
        image: alpine/socat
        name: udppipe
        volumeMounts:
          - mountPath: /var/run/datadog
            name: dsdsocket

This listens on a second 'anonymous' socket and pipes packets back to UDP to the localhost agent. I point my UDP applications to that alternative dd-anon.socket, which blocks Datadog Agent from reading the socat UDS origin. Now only the DD_ENTITY_ID origin tags are used.

ogaca-dd commented 5 years ago

Hello,

Thank you for opening the issue. We’ll take the proposal into consideration.

hkaj commented 5 years ago

Hi @danopia Thanks for the detailed issue!

Some context here: UDS and DD_ENTITY_ID are different ways to achieve the same purpose: origin detection (and thus, origin tagging). DD_ENTITY_ID is only used if you rely on the UDP port to send dogstatsd packets. In the UDS path, we rely on the pid of the process that sent the packet over the socket.

This explains why you see the tags from the container where the socat proxy lives.

If you want to use DD_ENTITY_ID (and the pod uid from the downward API) you will need to switch to UDP instead of Unix socket. The cool think about this is that you can remove the host volume needed for the socket, and the hostPid option since we don't need to inspect the source pid.

I hope that makes sense, please let me know if this is not the behavior you're expecting and would like to change it.

danopia commented 5 years ago

I actually tried using pure UDP with hostPort originally and the metrics just didn't show up in Datadog. Some research indicated that I needed some portmap module added to my EKS cluster. I use the amazon-vpc-cni-k8s project for networking and it's not really clear how to configure it for reliable hostPort UDP :(

I definitely prefer UDS inside a Kubernetes environment and the UDP->UDS sidecar has been doing the trick. My complaint is that all these static tags - docker_image:alpine/socat:latest, image_name:alpine/socat, image_tag:latest, short_image: socat, kube_container_name: udppipe aren't useful to the application developers in this case.

Overall, I expected that sending dd.internal.entity_id over UDS would opt that datagram out of the UDS origin tags and only use the Entity ID for that specific datagram. I use UDS origin tagging for other applications so the hostPid feature is quite useful overall. It's just these C# pods that are confused by UDS until https://github.com/DataDog/dogstatsd-csharp-client/issues/85 gets attention.

If you think everything is working as intended here I'll concede and keep using my workaround for C# 😏

hkaj commented 5 years ago

Thanks for the context, I agree we're missing a piece of the puzzle here. We should either add UDS support to the c# lib, or honor the entity_id tag in the UDS origin detection code path (it's only considered in the UDP path so far).

I'll check with the team, and we'll schedule some work around that. In the meantime if you know how to add UDS support to the C# lib and have time to do it, you're more than welcome 😄