DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
484 stars 151 forks source link

[BUG] x-datadog-sampling-priority seems to be ignored #1292

Closed Vesquen closed 2 years ago

Vesquen commented 3 years ago

Bug description

We have a long-running process (PHP) running which does read AMQP messages and traces the process when a span context is present on the message. Tracing works as expected, but whenever the x-datadog-sampling-priority is set to 0, the process still gets traced. All other processes in the distributed chain (non-PHP) seem to respect this.

PHP version

PHP 7.4.21 (cli) (built: Jul 22 2021 03:08:33) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies with Zend OPcache v7.4.21, Copyright (c), by Zend Technologies with ddtrace v0.62.1, Copyright Datadog, by Datadog with blackfire v1.60.0~linux-x64-non_zts74, https://blackfire.io, by Blackfire

Installed extensions

[PHP Modules] apcu bcmath blackfire Core ctype curl date ddtrace dom fileinfo filter ftp gd grpc hash iconv intl json libxml mbstring mongodb mysqli mysqlnd OAuth openssl pcntl pcre PDO pdo_mysql pdo_sqlite Phar posix protobuf readline redis Reflection session SimpleXML soap sockets sodium SPL sqlite3 standard tokenizer xml xmlreader xmlwriter Zend OPcache zip zlib

[Zend Modules] Zend OPcache blackfire ddtrace

OS info

PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster

Diagnostics and configuration

Output of phpinfo() (ddtrace >= 0.47.0)

Datadog PHP tracer extension For help, check out the documentation at https://docs.datadoghq.com/tracing/languages/php/ (c) Datadog 2020

Datadog tracing support => enabled Version => 0.62.1 DATADOG TRACER CONFIGURATION => {"date":"2021-08-12T14:04:24Z","os_name":"Linux 4.9.0-12-amd64 SMP Debian 4.9.210-1 (2020-01-20) x86_64","os_version":"4.9.0-12-amd64","version":"0.62.1","lang":"php","lang_version":"7.4.21","env":"dev-1","enabled":true,"service":null,"enabled_cli":true,"agent_url":"","debug":true,"analytics_enabled":false,"sample_rate":1.000000,"sampling_rules":null,"tags":null,"service_mapping":null,"distributed_tracing_enabled":true,"priority_sampling_enabled":true,"architecture":"x86_64","sapi":"cli","ddtrace.request_init_hook":"/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php","open_basedir_configured":false,"uri_fragment_regex":null,"uri_mapping_incoming":null,"uri_mapping_outgoing":null,"auto_flush_enabled":true,"generate_root_span":false,"http_client_split_by_domain":false,"measure_compile_time":true,"report_hostname_on_root_span":false,"traced_internal_functions":null,"auto_prepend_file_configured":false,"integrations_disabled":null,"enabled_from_env":true,"opcache.file_cache":null}

                           Diagnostics

Diagnostic checks => passed

Directive => Local Value => Master Value ddtrace.disable => Off => Off ddtrace.request_init_hook => /opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php => /opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php ddtrace.cgroup_file => /proc/self/cgroup => /proc/self/cgroup

labbati commented 3 years ago

Hello. Since we do not offer a AMQP specific integration, what do you mean by "x-datadog-sampling-priority is set to 0"? Is it metadata in the message?

Also, since it is a long running process, you are doing some manual tracing for the "root span" of the trace corresponding to each message that is process, can you provide any code snippet of that part?

Most likely, there is some work that we need to do.

Vesquen commented 3 years ago

Hello @labbati.

We have a chain of "workers" (long running process) which consume/produce messages. Most of them are written in Go, except for one, which is written in PHP.

The tracer on the first worker where the message initially comes in decides (by a configured sample rate) if the message is going to be traced or not. In the case the tracer decides it's not going to do the trace, the spancontext is still passed along in the message (we use the provided inject function of the tracer with a textmap carrier) but with x-datadog-sampling-rate set to 0. All following workers extract this spancontext and do there thing and seem to respect the set x-datadog-sampling-rate except for the PHP one. Whatever value is set on x-datadog-sampling-rate in the spancontext, the span will always pop-up in datadog APM (without the parent spans of the previous workers).

We indeed do manual tracing in those long running processes with DD_TRACE_GENERATE_ROOT_SPAN=0, DD_TRACE_AUTO_FLUSH_ENABLED=1 and extracting the spancontext from the message using a textmap carrier.

bwoebi commented 2 years ago

The primary issue is that starting 0.62.0 we broke the manual propagation of the distributed trace id and distributed parent id (it was still working for automatic propagation in case of simple web requests). Our apologies for not making the proper connection from the sampling-priority (because that part actually still worked, but the broken distributed tracing exhibits your issue) to the manual distributed tracing being broken.

We are currently working on a fix for this (see #1484).

labbati commented 2 years ago

0.70.0 has been released and it includes a fix to this issue.