DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
490 stars 152 forks source link

[BUG] segfault in php-fpm after several page hits with ddtrace various versions #1797

Open gjcarrette opened 1 year ago

gjcarrette commented 1 year ago

Bug description

In a production environment with nginx using fastcgi to talk to php-fpm there are different segfault SEGV manifestations depending on the version of ddtrace-trace-php. In all cases the SEGV as recorded by systemd coredumpctl is not the first page hit, but a later page hit. The page hit can be determined because the auto_prepend_file uses file_get_contents and file_put_contents on /proc/self/comm in order to keep a page hit counter that will be recorded by the kernel segfault handler.

Output of coredumpctl info will be attached.

There probably won't be sufficient information in this report to reproduce or fix a bug, but further investigation using valgrind and gdb will be forthcoming and the issue will be updated. coredumpctl-info-ddtrace-080.txt coredumpctl-info-ddtrace-071.txt

The SEGV with ddtrace 0.71.0 is pretty consistent inside zm_activate_ddtrace But with 0.80.0 it appears that the stack has been corrupted with the SEGV inside some memory management subroutine.

Tests will be run with ddtrace 0.81 and from a source build, to be determined.

PHP version

PHP 8.1.10 (cli) (built: Aug 30 2022 16:09:36) (NTS gcc x86_64) Copyright (c) The PHP Group Zend Engine v4.1.10, Copyright (c) Zend Technologies with Zend OPcache v8.1.10, Copyright (c), by Zend Technologies

The PHP itself and all the extensions are from the Remi build, except for datadog-php-tracer-0.71.0-1.x86_64 php81-pdo-sqlrelay-1.8.1-4.x86_64

The pdo_sqlrelay is from https://github.com/wayfair-incubator/pdo_sqlrelay_builder

Tracer version

0.71.0

Installed extensions

[PHP Modules] amqp apcu bcmath bz2 calendar Core ctype curl date dba ddtrace dom exif fileinfo filter ftp gd gettext gmp gnupg hash http iconv igbinary imagick imap intl json ldap libxml lz4 mbstring memcache memcached msgpack mustache mysqli mysqlnd odbc openssl pcntl pcre PDO pdo_dblib pdo_mysql PDO_ODBC pdo_sqlite pdo_sqlrelay pdo_sqlsrv Phar posix raphf rdkafka readline redis Reflection session shmop SimpleXML soap sockets sodium SPL sqlite3 sqlsrv standard sysvmsg sysvsem sysvshm tokenizer xml xmlreader xmlrpc xmlwriter xsl yaml Zend OPcache zip zlib

[Zend Modules] Zend OPcache ddtrace

OS info

NAME="Rocky Linux" VERSION="8.6 (Green Obsidian)" VERSION_ID="8.6" PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)" CPE_NAME="cpe:/o:rocky:rocky:8:GA" ROCKY_SUPPORT_PRODUCT_VERSION="8" REDHAT_SUPPORT_PRODUCT_VERSION="8"

Diagnostics and configuration

Output of phpinfo() (ddtrace >= 0.47.0)

{"date":"2022-11-16T00:34:23Z","os_name":"Linux admin8-reserve-iad1-txp4.c.wf-gcp-us-adminhome-prod.internal 4.18.0-372.19.1.el8_6.cloud.x86_64 #1 SMP Tue Aug 9 18:27:58 UTC 2022 x86_64","os_version":"4.18.0-372.19.1.el8_6.cloud.x86_64","version":"0.71.0","lang":"php","lang_version":"8.1.0RC1","env":"prod","enabled":true,"service":"admin8","enabled_cli":false,"agent_url":"http:\/\/localhost:8126","debug":false,"analytics_enabled":false,"sample_rate":1,"sampling_rules":[],"tags":{"datacenter":"iad1","wf_hostname":"admin8-reserve-iad1-txp4.c.wf-gcp-us-adminhome-prod.internal"},"service_mapping":[],"distributed_tracing_enabled":true,"priority_sampling_enabled":true,"dd_version":"94037fe84549072c865","architecture":"x86_64","sapi":"fpm-fcgi","datadog.trace.request_init_hook":"\/opt\/datadog-php\/dd-trace-sources\/bridge\/dd_wrap_autoloader.php","open_basedir_configured":false,"uri_fragment_regex":null,"uri_mapping_incoming":null,"uri_mapping_outgoing":null,"auto_flush_enabled":false,"generate_root_span":true,"http_client_split_by_domain":false,"measure_compile_time":true,"report_hostname_on_root_span":false,"traced_internal_functions":null,"auto_prepend_file_configured":true,"integrations_disabled":"default","enabled_from_env":true,"opcache.file_cache":null}

bwoebi commented 1 year ago

Hey @gjcarrette,

thanks for the report! A gdb/valgrind trace would indeed be very appreciated. It would be great, if the crashes can be reproduced on the latest version of dd-trace-php. (0.71.0 is quite old and core structures have been changed since then)

bwoebi commented 1 year ago

Ah, just seeing the crash dump for 0.80.0 has been added. Given the source function (_emalloc and zend_mm_gc mostly), this is most likely some use-after-free issue, for which a valgrind report would be most useful (with USE_ZEND_ALLOC=0 set).

gjcarrette commented 1 year ago

The Foundational Platform Operations team wanted to try out the 0.81 release, and it exhibited a slightly different pattern of segfault revealed by coredumpctl info.

Getting some valgrind and other tests from the Software Development Environment (SDE) is still on the todo list.

bwoebi commented 1 year ago

Hey, thanks for the update!

I wonder whether you could open a dump (of the 0.81 crashes) with gdb and get exact line numbers (gdb php-fpm -c path/to/corefile, then bt for backtrace)? It seems to me like coredumpctl is not able to.

Given the location of that crash, are you using preloading?

gjcarrette commented 1 year ago

This being in the middle of CYBER-5, I wanted to show that there was some activity to get a valgrind going, in our Software Development Environment (SDE) because I'm not touching the production environment right now. Running valgrind with php-fpm is a bit uglier (having done it) so I am starting out by capturing the fastcgi protocol and using my decoder to turn it into the set of environment variables and stdin content for running tests using the php-cgi command line, that being a close easy test plan for a single page hit. The page hit under test is fairly vanilla idempotent GET request returning html in an internally facing customer support web application.

The only ddtrace interesting thing I found so far is this, using --tool=memcheck with --leak-check=full and just about this only thing this shows is that datadog-php-tracer-0.71.0-1.x86_64 had a leak. I will have more next week, with more symbols loaded etc.

416 bytes in 13 blocks are definitely lost in loss record 99 of 130 at 0x4C378C3: operator new(unsigned long) (vg_replace_malloc.c:422) by 0x22949DF2: ??? by 0x177C226F: ??? by 0x227106DD: zif_ddtrace_internal_function_handler (engine_hooks.c:815) by 0x96929A4: ??? by 0x20C273: ??? (in /usr/bin/php-cgi) by 0x43FAC3: execute_ex (in /usr/bin/php-cgi) by 0x96920D9: ??? by 0x20C1DE: ??? (in /usr/bin/php-cgi) by 0x43FAC3: execute_ex (in /usr/bin/php-cgi) by 0x96920D9: ??? by 0x20C1DE: ??? (in /usr/bin/php-cgi)

856 bytes in 101 blocks are definitely lost in loss record 105 of 130 at 0x4C37135: malloc (vg_replace_malloc.c:381) by 0x3A2D7C: __zend_malloc (in /usr/bin/php-cgi) by 0x3A7DE2: _estrndup (in /usr/bin/php-cgi) by 0x177BF823: ??? by 0x22949C7F: ??? by 0x177C226F: ??? by 0x227106DD: zif_ddtrace_internal_function_handler (engine_hooks.c:815) by 0x96929A4: ??? by 0x20C273: ??? (in /usr/bin/php-cgi) by 0x43FAC3: execute_ex (in /usr/bin/php-cgi) by 0x96920D9: ??? by 0x20C1DE: ??? (in /usr/bin/php-cgi)

bwoebi commented 1 year ago

@gjcarrette Did you have a chance to look further into it?