DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
552 stars 416 forks source link

Celery task producer (celery.apply) span in APM triggers Datadog service inference into creating a new service for the service's own hostname #11491

Open patrys opened 1 day ago

patrys commented 1 day ago

We've enabled service inference, and our service list is now filled with spurious services named after every pod in every k8s service that publishes tasks. After running for just an hour, we already have 150 of those.

All of the fake services only report a single source of data, celery.apply and visiting traces for celery.apply confirms that celery.hostname seems to be converted into peer.hostname despite this span not actually making a client connection anywhere.

Under each celery.apply span we see the expected sqs.sendmessage span for the actual delivery of the task. That second span has the correct peer tags and is paired with the expected queue service.

Agents are deployed using Helm chart version 3.73.0 Code is traced using dd-trace-py version 2.11.2

patrys commented 11 hours ago

After some investigation, the signal handler calls set_tags_from_context(span, kwargs["headers"]), and the comment suggests, that it's specifically to set celery.hostname. The hostname in question is the pod name of the task producer.

Then, the trace_afer_publish signal handler extracts the celery.hostname and uses its value to set out.host, which is wrong as the hostname is the origin of the task, not its target.

out.host is then transformed (by the agent, I assume) to peer.hostname, which is expected behavior, but because of the above, the value is incorrect.