DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
501 stars 155 forks source link

[Bug]: Errors during telemetry data flush - broken pipe #2915

Open kn0x1c opened 3 weeks ago

kn0x1c commented 3 weeks ago

Bug report

Hello,

I am using a currently not supported integration of your tracer. But it worked fine for the last versions.

My integration: AWS Lambda with Vapor PHP 8.2 Laravel 10.x

My issue is, that starting from tracer version 1.4.0 I get the following error:

NOTICE: PHP message: [ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
NOTICE: PHP message: [ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
NOTICE: PHP message: [ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
NOTICE: PHP message: [ddtrace] [error] Failed sending traces to the sidecar: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Fun part is, that it still sends data, metrics traces etc. all are appearing in the UI

I took a couple of hours and debugged it a little bit, I tried the following: Upgrading / Downgrading the AWS Lambda layer to: v60-alpine and v65-alpine Upgrading PHP from 8.2 to 8.3 Upgrading / Downgrading the tracer to: v1.0.0, v1.3.0, v1.3.2, v1.4.0, v1.4.2

As far for now I was able to tackle it down, that the issue is only present in all version of the tracer starting from v1.4.0 (v1.4.X) and it doesn't matter if it's PHP 8.2/8.3 and also which AWS Lambda layer (agent) version I am using.

I reviewed the changes and saw, that you switched to sidecar trace sender (I assume also for trace metrics) in v1.4.0 (tracer) but only for PHP 8.3. However I also get the issue in 8.2 so I am not sure if that is the problem, but I would explain a lot.

Also it seems like the application is MUCH slower, 2-3x.

Latest working configuration: PHP: 8.2/8.3 AWS Lambda Layer: v65-alpine

Question Is there something I can do, for example enable something in the agent (aws lambda layer), for example socket based communication etc. something like that, so I can get the new tracer running in lambda. I know you are not supporting it yet, but it worked flawlessly in previous versions and we love using datadog and even rolled the AWS Lambda integration we build out to production, with no issues!

Tech Stuff Dockerfile (not working using tracer 1.4.0)

FROM laravelphp/vapor:php82

# Add DataDog APM tracer
RUN apk add tar gzip libgcc \
&&  curl -LO --retry 3 https://github.com/DataDog/dd-trace-php/releases/download/1.4.0/datadog-setup.php \
&&  ln -s /sbin/ldconfig /usr/local/bin/ldconfig \
&&  php datadog-setup.php --php-bin=all \
&&  rm -f datadog-setup.php

# Load datadog agent layer
COPY --from=public.ecr.aws/datadog/lambda-extension:65-alpine /opt/. /opt/

COPY . /var/task

PHP version

8.2

Tracer or profiler version

1.4.0

Installed extensions

No response

Output of phpinfo()

No response

Upgrading from

Tracer 1.3.2

kn0x1c commented 3 weeks ago

Update: I saw that you already identified such an issue with Lambda: https://github.com/DataDog/dd-trace-php/pull/2904

But however I tried using the following combinations: PHP: 8.2 / 8.3 Agent Lambda Layer: 65-alpine Tracer: 1.4.2 (this one should have this disabled check from the above PR)

=> It still throws the same errors, even though running in a AWS Lambda Function with the ENV AWS_LAMBDA_FUNCTION_NAME set

[ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

But: It seems that the trace error is gone. Is it possible to update the conditional sending via sidecar also for telemtry, service and lifecycle data?

kn0x1c commented 3 weeks ago

Update: We rolled out to production and reduced consistently our performance by factor 3-5x Image

Tracer is now fixed to 1.3.2

TophrC-dd commented 2 weeks ago

Hey @kn0x1c -- My name is Topher, I am a senior escalations engineer that specializes in serverless here at datadog. I am sorry you're seeing this performance issue. Would it be possible for you to provide a bare bones vapor project for us to work with to further investigate?

Best, ~Topher