Open gornostal opened 4 years ago
BTW, I don't see a clear correlation with #1067. I only can tell that all containers that have this issue also log SIGSEGV sometimes minutes sometimes hours later.
Actually, this is the only place in the code, where curl logging can be set to verbose.
I'm trying to imagine a scenario where code can enter that line while DD_TRACE_DEBUG
env var is unset :thinking:
Hi @labbati I'm looking for some ideas on how find the cause of this issue. Perhaps you or your team have any suggestions.
I eliminated three possible scenarios where the curl verbose logging could be enabled, and I'm totally confused now. Because the issue occurs only in production, I cannot set a breakpoint on curl_exec. I only can add more logging in PHP code.
Three places in the code where the curl logs may be set to verbose:
src/ext/coms.c
: It shouldn't happen there because in the containers stdout I see headers such as Datadog-Meta-Lang-Version
(see the description of this issue) that are not used in the C codesrc/DDTrace/Transport/Http.php
in vendor
dir: vendor/datadog/dd-trace/src/DDTrace/Transport/Http.php
is not used at all because Http class is loaded from /opt/datadog-php/dd-trace-sources/bridge/_generated.php
Http
class in /opt/datadog-php/dd-trace-sources/bridge/_generated.php
: That's where I put a lot of logs around the code. I was pretty sure I would catch the issue there, but to my surprise the code doesn't enter lines below this if statement.👋 @gornostal That's strange that verbose logging is only happening on production. If you explicitly set DD_TRACE_AGENT_DEBUG_VERBOSE_CURL=0
on production, does that make any difference? You can also try setting DD_TRACE_DEBUG_CURL_OUTPUT=0
. That should be the default value for both.
Thanks for the suggestion @SammyK Unfortunately that didn't help. I set both variables to 0.
Hey @gornostal. I wanted to let you know that we were able to recreate it only once and no more time. This is super strange and at the moment I have not an explanation. We have to look into it more in depth, and more importantly, we have to find (and understand) a way to recreate it.
I have two questions:
php-fpm.conf
and www.conf
so we can try to apply your exact logging configuration?Hey @labbati It's a relief to know the issue can be (although not reliably) reproduced in another environment.
how disruptive is this for you?
I haven't found any workaround. It's not super disruptive -- a lot of people are complaining about noisy logs in Datadog. Plus, our error metrics are not as reliable as before. Other than that it doesn't cause any problems. However, if we don't find a cause soon (2-4 weeks), I'm going to try to downgrade DD version again, hoping to find a version between 0.35 and 0.44 that doesn't have this issue.
I'll share the config files today.
@labbati I shared configs privately to your @datadoghq.com email
A quick update on this topic as we keep investigating.
The initial * Expire in ...
lines are info logged by mistake in curl source code as part of 7.64.0 that has been reported as a bug and fixed in curl 7.64.1, which is not available on Debian 10, though. As a confirmation to this if you force installation of curl 7.72 those lines will disappear. There is nothing we can do to avoid those messages to be dumped, other than recommending to install a more recent version of curl.
We were not able, yet, to see why and in which conditions the remaining part of the logged message get printed - from * Connection 435 seems to be dead! ...
onward. I can confirm it is not in our code base (or at least not in a trivial way) but having not found exactly when that gets printed I cannot tell if there is something we can do to prevent that message to be printed (e.g. a flag or similar).
Will keep you posted, but it will require some extra time.
Ah.. so that was a bug in curl. I haven't thought to check that. Installing a more recent version won't be a problem for us.
As for the logs from * Connection 435 seems to be dead! ...
onward, how likely do you think it could also be a bug in curl?
And a big thanks for the update! :+1:
I have a similar problem, it happens randomly and at least once a day
* Connection #29 to host dockerhost left intact
<
< Content-Length: 160
"< Date: Thu, 04 Feb 2021 08:36:48 GMT"
< Content-Type: application/json
< HTTP/1.1 200 OK
* Mark bundle as not supporting multiuse
* Signaling end of chunked upload via terminating chunk.
X-Datadog-Trace-Count: 8
Content-Type: application/msgpack
Transfer-Encoding: chunked
Datadog-Container-Id: c2618a9c2d07f48c8ebc67fca36ed7c6e9d197e30e4c8a8def24387dc077e44f
Datadog-Meta-Tracer-Version: 0.54.0
Datadog-Meta-Lang-Interpreter: fpm-fcgi
Datadog-Meta-Lang-Version: 7.4.14
Datadog-Meta-Lang: php
Accept: */*
Host: dockerhost:8126
> PUT /v0.4/traces HTTP/1.1
* Connected to dockerhost (172.17.0.1) port 8126 (#22)
* Re-using existing connection! (#22) with host dockerhost
"* Can not multiplex, even if we wanted to!"
* Found bundle for host dockerhost: 0x7fccd6a81a50 [serially]
@damiencriado we have such nasty verbosity as well. Have you found any solution to this? Because we've updated curl-lib and have latest dd-trace extension in use and still have to deal with the spam in logs. Also could not have find the reason of this.
@damiencriado we have such nasty verbosity as well. Have you found any solution to this? Because we've updated curl-lib and have latest dd-trace extension in use and still have to deal with the spam in logs. Also could not have find the reason of this.
Hey @ephrin, I didn't find any solution, I'm using an Alpine image with the latest curl-lib version and the problem is still there.
Hey @ephrin, I didn't find any solution, I'm using an Alpine image with the latest curl-lib version and the problem is still there.
This is happening for me too with the php:7.4-fpm-alpine docker image from docker hub and that seems to be running curl version 7.76.1 We're seeing the same kind of output in our logs. I'd rather not have to pattern-match exclusions from our alerting triggers for this.
@chrisivens we had to compile libcurl with verbose off in order to get rid of it. seems like the spam is caused by the intersection of threads one of which enables verbosity but sends nothing special and another one do the spam in a meanwhile (not 100% but close to it)
src/ext/coms.c
: It shouldn't happen there because in the containers stdout I see headers such asDatadog-Meta-Lang-Version
(see the description of this issue) that are not used in the C code
If I'm correct the C code also prints the headers, follow the calls _dd_curl_send_stack
-> _dd_curl_set_headers
, where headers are set in the for loop using dd_agent_curl_headers
, which is meant to be initialised by dd_agent_headers_alloc
.
Sorry to bump an old thread, but has anyone found a working solution here? Just started seeing this in our production containers after a recent update to php 7.4.
We have on our roadmap to investigate splitting the background sending off to a separate process, to prevent such issues. I cannot promise anything yet, but we want to tackle this as well.
Bug description
The PHP project that I'm working on formats all log entries as JSON and sends it to container's stdout. But quite often I see unformatted logs like below in the container's output. With our load in production I see the issue once every 5 minutes (or once every 1-2k requests). I haven't been able to find steps to reproduce this issue in non-production env.
I found only two places in the dd-trace-php code where there's a curl request: one in
dd-trace/src/ext/coms.c
and one indd-trace/src/DDTrace/Transport/Http.php
. In bot cases verbose logging should be enabled only if env varDD_TRACE_DEBUG
is set totrue
. I'm sure this variable is unset in the containers where I got logs from.PHP version
Installed extensions
extensions
``` [PHP Modules] bcmath calendar Core ctype curl date ddtrace dom exif FFI fileinfo filter ftp gd gettext gmp hash iconv igbinary intl json libxml mbstring mysqli mysqlnd openssl pcntl pcre PDO pdo_mysql Phar posix rdkafka readline redis Reflection session shmop SimpleXML sockets sodium SPL ssh2 standard sysvmsg sysvsem sysvshm tokenizer xdebug xml xmlreader xmlwriter xsl Zend OPcache zip zlib [Zend Modules] Zend OPcache ddtrace ```
OS info
Diagnostics and configuration
Output of phpinfo() (ddtrace >= 0.47.0)
Upgrading info
I haven't seen this issue with php5.6 and DD 0.35. Then we migrated to PHP7 and DD v0.44.1 -- tbh, I don't remember if I've seen it there. I definitely see it with v0.49.0