DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
501 stars 155 forks source link

[Bug]: Upgrading to 1.4.0 breaks sending traces #2889

Closed jc-beyer-tqgg closed 1 month ago

jc-beyer-tqgg commented 1 month ago

Bug report

Hey everyone ! After updating dd_trace extension to 1.4.0 from 1.3.2, traces are not being send anymore.

I can see the following errors in Datadog:

Oct 11 09:57:56.392
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Oct 11 09:57:56.392
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Oct 11 09:57:56.392
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Oct 11 09:57:56.388
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed sending traces to the sidecar: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

My php settings are:

extension=ddtrace.so
datadog.trace.request_init_hook=/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php
datadog.trace.cli_enabled=On
datadog.trace.generate_root_span=Off
datadog.trace.auto_flush_enabled=On

Edit: My services are all executed as Lambdas in AWS !

PHP version

8.3.12

Tracer or profiler version

1.4.0

Installed extensions

No response

Output of phpinfo()

No response

Upgrading from

1.3.2

bwoebi commented 1 month ago

Hey @jc-beyer-tqgg,

Yes, we are rolling out a new version of trace sending with 1.4.0, so it seems that doesn't work for you sadly.

We would be very interested in reproducing this behaviour. I.e. a strace (strace -fTtts 1000 <php executable invocation here>, with no dd-ipc-helper process living when the process is launched), along with trace logs (DD_TRACE_LOG_LEVEL=trace DD_TRACE_LOG_FILE=/tmp/helper.log) would help a lot. If you want to provide it, please contact the support and mention this ticket and to route it directly to me, thanks a lot!

In case you just want it working, you can set datadog.trace.sidecar_trace_sender=0.

jc-beyer-tqgg commented 1 month ago

Hey @bwoebi ! Im happy to help and provide any helpful information but my services are all executed as lambda. So its not easy to trace or save a log file 😓

Is there anything i can do in lambda context that would help you ?

bwoebi commented 1 month ago

Having mentioned that you're in a lambda context is possibly already helpful? I don't know very much about lambda, we'll try to reproduce it soon there. And if we don't manage to, we'll come back to you - thanks for offering help!

So yeah, for now please just revert to the old sender with the ini config mentioned before.

j-fulbright commented 1 month ago

Seeing the same issue here with our Lambdas (using Images (Docker)

j-fulbright commented 1 month ago

Having mentioned that you're in a lambda context is possibly already helpful? I don't know very much about lambda, we'll try to reproduce it soon there. And if we don't manage to, we'll come back to you - thanks for offering help!

So yeah, for now please just revert to the old sender with the ini config mentioned before.

We added the ini setting at the Docker level when installing the extension, the same place we add some other settings and it didn't seem to help at all. I'm now trying DD_SIDECAR_TRACE_SENDER_DEFAULT = false in our ENV

rquinaud commented 1 month ago

@j-fulbright Hi, I'm not in the same context as your are (Lambda AWS) but having similar issues. My context is GCP GKE execution in a cronjob , PHP 8.3.8, dd library 1.4.0 or 1.4.1 (bot failing).

After hours of research, the "sidecar" feature seems to be the cumber-stone. By the way, I did not find relevant documentation about this sadly (maybe my bad)

I tried setting up the DD_SIDECAR_TRACE_SENDER_DEFAULT as well with no result.

BUT;

DD_TRACE_SIDECAR_TRACE_SENDER: "0"

Setting up this variable make the trace uploaded as it was with dd-library 1.3.1

Regards

j-fulbright commented 1 month ago

We ended up rolling back to the older tracer for the time being, as it was filling up server disk space due to core dumps

bwoebi commented 1 month ago

We're going to release a 1.4.2 on Monday which will detect lambda and disable the sidecar trace sender by default.

bwoebi commented 1 month ago

1.4.2 has been released, with us looking for the AWS_LAMBDA_FUNCTION_NAME env in lambda now, and autodisabling the sidecar in that case for now (until we can properly fix this).

j-fulbright commented 1 month ago

This doesn't seem to be reliable or has other issues, as we're still seeing Broken Pipe errors (Docker image running on Lambda)


[ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
--
[ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
bwoebi commented 1 month ago

We're looking at fully fixing this, but these errors have been present for a long time (but were silently discarded). However the tracing itself should work again by now. You may ignore these Broken pipe errors for now.

matthew-mcmullin commented 1 week ago

With the broken pipe errors now reporting we are being billed for log usage / indexing. We are still seeing this problem in our PHP images. What is the timeline on getting them fixed so that errors aren't reported?

bwoebi commented 1 week ago

@matthew-mcmullin The fix (disabling the faulty telemetry sending on lambda environments) is in #2948. We intend to release early next week.