DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
496 stars 155 forks source link

[BUG] Unexpected verbose logging of curl requests to PUT /v0.4/traces #1073

Open gornostal opened 4 years ago

gornostal commented 4 years ago

Bug description

The PHP project that I'm working on formats all log entries as JSON and sends it to container's stdout. But quite often I see unformatted logs like below in the container's output. With our load in production I see the issue once every 5 minutes (or once every 1-2k requests). I haven't been able to find steps to reproduce this issue in non-production env.

I found only two places in the dd-trace-php code where there's a curl request: one in dd-trace/src/ext/coms.c and one in dd-trace/src/DDTrace/Transport/Http.php. In bot cases verbose logging should be enabled only if env var DD_TRACE_DEBUG is set to true. I'm sure this variable is unset in the containers where I got logs from.

* Expire in 0 ms for 6 (transfer 0x7f566c040dc0)
* Expire in 5000 ms for 8 (transfer 0x7f566c040dc0)
* Expire in 2000 ms for 2 (transfer 0x7f566c040dc0)
* Connection 435 seems to be dead!
* Closing connection 435
* Hostname 192.168.51.224 was found in DNS cache
*   Trying 192.168.51.224...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x7f566c040dc0)
* Connected to 192.168.51.224 (192.168.51.224) port 8126 (#436)
> PUT /v0.4/traces HTTP/1.1
Host: 192.168.51.224:8126
Accept: */*
Datadog-Meta-Lang: php
Datadog-Meta-Lang-Version: 7.4.11
Datadog-Meta-Lang-Interpreter: fpm-fcgi
Datadog-Meta-Tracer-Version: 0.49.0
Datadog-Container-Id: 26e70fdbeda797702ed0b2b0d50cb44551763368ccf729de9c39b57f083e198d
Transfer-Encoding: chunked
Content-Type: application/msgpack
X-Datadog-Trace-Count: 2

* Signaling end of chunked upload via terminating chunk.
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 03 Nov 2020 06:26:20 GMT
< Content-Length: 296
<
* Connection #436 to host 192.168.51.224 left intact 

PHP version

PHP 7.4.11 (cli) (built: Oct 18 2020 19:44:48) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Zend OPcache v7.4.11, Copyright (c), by Zend Technologies
    with ddtrace v0.49.0, Copyright Datadog, by Datadog

Installed extensions

extensions

``` [PHP Modules] bcmath calendar Core ctype curl date ddtrace dom exif FFI fileinfo filter ftp gd gettext gmp hash iconv igbinary intl json libxml mbstring mysqli mysqlnd openssl pcntl pcre PDO pdo_mysql Phar posix rdkafka readline redis Reflection session shmop SimpleXML sockets sodium SPL ssh2 standard sysvmsg sysvsem sysvshm tokenizer xdebug xml xmlreader xmlwriter xsl Zend OPcache zip zlib [Zend Modules] Zend OPcache ddtrace ```

OS info

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster

Diagnostics and configuration

Output of phpinfo() (ddtrace >= 0.47.0)

image

Upgrading info

I haven't seen this issue with php5.6 and DD 0.35. Then we migrated to PHP7 and DD v0.44.1 -- tbh, I don't remember if I've seen it there. I definitely see it with v0.49.0

gornostal commented 4 years ago

BTW, I don't see a clear correlation with #1067. I only can tell that all containers that have this issue also log SIGSEGV sometimes minutes sometimes hours later.

gornostal commented 4 years ago

Actually, this is the only place in the code, where curl logging can be set to verbose. I'm trying to imagine a scenario where code can enter that line while DD_TRACE_DEBUG env var is unset :thinking:

gornostal commented 3 years ago

Hi @labbati I'm looking for some ideas on how find the cause of this issue. Perhaps you or your team have any suggestions.

I eliminated three possible scenarios where the curl verbose logging could be enabled, and I'm totally confused now. Because the issue occurs only in production, I cannot set a breakpoint on curl_exec. I only can add more logging in PHP code.

Three places in the code where the curl logs may be set to verbose:

  1. src/ext/coms.c: It shouldn't happen there because in the containers stdout I see headers such as Datadog-Meta-Lang-Version (see the description of this issue) that are not used in the C code
  2. src/DDTrace/Transport/Http.php in vendor dir: vendor/datadog/dd-trace/src/DDTrace/Transport/Http.php is not used at all because Http class is loaded from /opt/datadog-php/dd-trace-sources/bridge/_generated.php
  3. Autogenerated Http class in /opt/datadog-php/dd-trace-sources/bridge/_generated.php: That's where I put a lot of logs around the code. I was pretty sure I would catch the issue there, but to my surprise the code doesn't enter lines below this if statement.
SammyK commented 3 years ago

👋 @gornostal That's strange that verbose logging is only happening on production. If you explicitly set DD_TRACE_AGENT_DEBUG_VERBOSE_CURL=0 on production, does that make any difference? You can also try setting DD_TRACE_DEBUG_CURL_OUTPUT=0. That should be the default value for both.

gornostal commented 3 years ago

Thanks for the suggestion @SammyK Unfortunately that didn't help. I set both variables to 0.

labbati commented 3 years ago

Hey @gornostal. I wanted to let you know that we were able to recreate it only once and no more time. This is super strange and at the moment I have not an explanation. We have to look into it more in depth, and more importantly, we have to find (and understand) a way to recreate it.

I have two questions:

gornostal commented 3 years ago

Hey @labbati It's a relief to know the issue can be (although not reliably) reproduced in another environment.

how disruptive is this for you?

I haven't found any workaround. It's not super disruptive -- a lot of people are complaining about noisy logs in Datadog. Plus, our error metrics are not as reliable as before. Other than that it doesn't cause any problems. However, if we don't find a cause soon (2-4 weeks), I'm going to try to downgrade DD version again, hoping to find a version between 0.35 and 0.44 that doesn't have this issue.

I'll share the config files today.

gornostal commented 3 years ago

@labbati I shared configs privately to your @datadoghq.com email

labbati commented 3 years ago

A quick update on this topic as we keep investigating.

The initial * Expire in ... lines are info logged by mistake in curl source code as part of 7.64.0 that has been reported as a bug and fixed in curl 7.64.1, which is not available on Debian 10, though. As a confirmation to this if you force installation of curl 7.72 those lines will disappear. There is nothing we can do to avoid those messages to be dumped, other than recommending to install a more recent version of curl.

We were not able, yet, to see why and in which conditions the remaining part of the logged message get printed - from * Connection 435 seems to be dead! ... onward. I can confirm it is not in our code base (or at least not in a trivial way) but having not found exactly when that gets printed I cannot tell if there is something we can do to prevent that message to be printed (e.g. a flag or similar).

Will keep you posted, but it will require some extra time.

gornostal commented 3 years ago

Ah.. so that was a bug in curl. I haven't thought to check that. Installing a more recent version won't be a problem for us.

As for the logs from * Connection 435 seems to be dead! ... onward, how likely do you think it could also be a bug in curl?

And a big thanks for the update! :+1:

damiencriado commented 3 years ago

I have a similar problem, it happens randomly and at least once a day

* Connection #29 to host dockerhost left intact
<
< Content-Length: 160
"< Date: Thu, 04 Feb 2021 08:36:48 GMT"
< Content-Type: application/json
< HTTP/1.1 200 OK
* Mark bundle as not supporting multiuse
* Signaling end of chunked upload via terminating chunk.
X-Datadog-Trace-Count: 8
Content-Type: application/msgpack
Transfer-Encoding: chunked
Datadog-Container-Id: c2618a9c2d07f48c8ebc67fca36ed7c6e9d197e30e4c8a8def24387dc077e44f
Datadog-Meta-Tracer-Version: 0.54.0
Datadog-Meta-Lang-Interpreter: fpm-fcgi
Datadog-Meta-Lang-Version: 7.4.14
Datadog-Meta-Lang: php
Accept: */*
Host: dockerhost:8126
> PUT /v0.4/traces HTTP/1.1
* Connected to dockerhost (172.17.0.1) port 8126 (#22)
* Re-using existing connection! (#22) with host dockerhost
"* Can not multiplex, even if we wanted to!"
* Found bundle for host dockerhost: 0x7fccd6a81a50 [serially]
ephrin commented 3 years ago

@damiencriado we have such nasty verbosity as well. Have you found any solution to this? Because we've updated curl-lib and have latest dd-trace extension in use and still have to deal with the spam in logs. Also could not have find the reason of this.

damiencriado commented 3 years ago

@damiencriado we have such nasty verbosity as well. Have you found any solution to this? Because we've updated curl-lib and have latest dd-trace extension in use and still have to deal with the spam in logs. Also could not have find the reason of this.

Hey @ephrin, I didn't find any solution, I'm using an Alpine image with the latest curl-lib version and the problem is still there.

chrisivens commented 3 years ago

Hey @ephrin, I didn't find any solution, I'm using an Alpine image with the latest curl-lib version and the problem is still there.

This is happening for me too with the php:7.4-fpm-alpine docker image from docker hub and that seems to be running curl version 7.76.1 We're seeing the same kind of output in our logs. I'd rather not have to pattern-match exclusions from our alerting triggers for this.

ephrin commented 3 years ago

@chrisivens we had to compile libcurl with verbose off in order to get rid of it. seems like the spam is caused by the intersection of threads one of which enables verbosity but sends nothing special and another one do the spam in a meanwhile (not 100% but close to it)

ubiuser commented 3 years ago
  1. src/ext/coms.c: It shouldn't happen there because in the containers stdout I see headers such as Datadog-Meta-Lang-Version (see the description of this issue) that are not used in the C code

If I'm correct the C code also prints the headers, follow the calls _dd_curl_send_stack -> _dd_curl_set_headers, where headers are set in the for loop using dd_agent_curl_headers, which is meant to be initialised by dd_agent_headers_alloc.

llwt commented 2 years ago

Sorry to bump an old thread, but has anyone found a working solution here? Just started seeing this in our production containers after a recent update to php 7.4.

bwoebi commented 2 years ago

We have on our roadmap to investigate splitting the background sending off to a separate process, to prevent such issues. I cannot promise anything yet, but we want to tackle this as well.