DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
484 stars 151 forks source link

[Bug]: ddappsec startup warnings #2501

Open amberovsky opened 7 months ago

amberovsky commented 7 months ago

Bug report

Ubuntu 20.04 in Docker

PHP Warning: PHP Startup: [ddappsec] Failed to retrieve root span meta in Unknown on line 0 PHP Warning: PHP Startup: [ddappsec] Failed to add tag _dd.appsec.event_rules.version in Unknown on line 0 PHP Warning: PHP Startup: [ddappsec] Failed to retrieve root span meta in Unknown on line 0 PHP Warning: PHP Startup: [ddappsec] Failed to add _dd.runtime_family to root span in Unknown on line 0

PHP version

PHP 8.2.15 (cli) (built: Jan 20 2024 14:16:39) (NTS)

Tracer or profiler version

0.97.0

Installed extensions

[PHP Modules] bcmath calendar Core ctype curl date dom exif FFI fileinfo filter ftp gettext hash iconv intl json libxml mbstring mongodb openssl pcntl pcre PDO Phar posix random readline Reflection session shmop SimpleXML sockets sodium SPL standard sysvmsg sysvsem sysvshm tokenizer xml xmlreader xmlwriter xsl Zend OPcache zlib

[Zend Modules] Zend OPcache

Output of phpinfo()

Unfortunately I'm unable to provide either phpinfo() (internal service without direct http access) nor CLI SAPI (disabled)

There are no changes to the default config apart from these:

extension = ddtrace.so extension = datadog-profiling.so extension = ddappsec.so datadog.appsec.enabled = On

Upgrading from

No response

Anilm3 commented 7 months ago

Hello @amberovsky ,

This typically happens when ddtrace is not available:

On the other hand, anything you could let us know about your setup could be useful for us to reproduce the issue, for example are you using apache or fpm? Any specific framework?

amberovsky commented 7 months ago

Hi @Anilm3 ,

Symfony 6.4, nginx, php-fpm, OpCache

[28-Jan-2024 19:55:39] WARNING: [pool www] child 26668 exited on signal 11 (SIGSEGV) after 5197.846840 seconds from start
[29-Jan-2024 11:41:57] WARNING: [pool www] child 26718 exited on signal 11 (SIGSEGV) after 8.092824 seconds from start
[29-Jan-2024 20:23:59] WARNING: [pool www] child 16897 exited on signal 11 (SIGSEGV) after 596.541010 seconds from start
[29-Jan-2024 20:24:06] WARNING: [pool www] child 17184 exited on signal 11 (SIGSEGV) after 7.301436 seconds from start
[29-Jan-2024 22:08:50] WARNING: [pool www] child 16032 exited on signal 11 (SIGSEGV) after 7546.904623 seconds from start
[30-Jan-2024 13:54:52] WARNING: [pool www] child 102 exited on signal 11 (SIGSEGV) after 417.388427 seconds from start
[30-Jan-2024 15:00:33] WARNING: [pool www] child 98 exited on signal 11 (SIGSEGV) after 4357.623340 seconds from start
[30-Jan-2024 15:00:33] WARNING: [pool www] child 95 exited on signal 11 (SIGSEGV) after 4358.157075 seconds from start
[30-Jan-2024 15:13:45] WARNING: [pool www] child 113 exited on signal 11 (SIGSEGV) after 5149.703371 seconds from start
[30-Jan-2024 21:52:23] WARNING: [pool www] child 22142 exited on signal 11 (SIGSEGV) after 153.467050 seconds from start
[31-Jan-2024 07:10:49] WARNING: [pool www] child 28520 exited on signal 11 (SIGSEGV) after 25816.944905 seconds from start
[31-Jan-2024 08:38:40] WARNING: [pool www] child 28202 exited on signal 11 (SIGSEGV) after 31568.360108 seconds from start

And

Jan 28 19:55:39 kernel: php-fpm8.2[14862]: segfault at 56160000030e ip 00007fe3982d6438 sp 00007ffd4c8deb80 error 4 in datadog-profiling.so[7fe397f98000+4f9000]
Jan 29 11:41:57 kernel: traps: php-fpm8.2[18537] general protection fault ip:561634482f76 sp:7ffd4c8dedd0 error:0 in zero (deleted)[561634400000+6400000]
Jan 29 20:23:59 kernel: traps: php-fpm8.2[23992] general protection fault ip:561634482f76 sp:7ffd4c8dedd0 error:0 in zero (deleted)[561634400000+6400000]
...

UPD: Just saw the segfault is related to datadog-profiling.so

Anilm3 commented 7 months ago

I'm not sure if the segfaults are related at all, but perhaps that's a more pressing problem.

@realFlowControl can you have a look?

bwoebi commented 7 months ago

@amberovsky Is there a possibility that you could get core dumps and post us at least the stacktraces? That would be probably helpful in triaging. Thanks!

amberovsky commented 6 months ago

@bwoebi @Anilm3 does this help?

(gdb) thread apply all bt full

Thread 4 (Thread 0x7f0f5adfd700 (LWP 328)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x00007f0f5c54d985 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62
No locals.
#2  std::sys_common::thread_parking::futex::Parker::park () at library/std/src/sys_common/thread_parking/futex.rs:52
No locals.
#3  std::thread::park () at library/std/src/thread/mod.rs:978
No locals.
#4  0x00007f0f5c310e3d in crossbeam_channel::context::Context::wait_until () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/context.rs:177
No locals.
#5  0x00007f0f5c311102 in crossbeam_channel::flavors::array::Channel<T>::recv::{{closure}} () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/flavors/array.rs:431
No locals.
#6  crossbeam_channel::context::Context::with::{{closure}} () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/context.rs:52
No locals.
#7  0x00007f0f5c311c17 in crossbeam_channel::context::Context::with::{{closure}} () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/context.rs:60
No locals.
#8  std::thread::local::LocalKey<T>::try_with () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/local.rs:270
No locals.
#9  crossbeam_channel::context::Context::with () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/context.rs:55
No locals.
#10 crossbeam_channel::flavors::array::Channel<T>::recv () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/flavors/array.rs:420
No locals.
#11 0x00007f0f5c343caa in crossbeam_channel::channel::Receiver<T>::recv () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/channel.rs:814
No locals.
#12 0x00007f0f5c34e811 in datadog_php_profiling::profiling::uploader::Uploader::run () at profiling/src/profiling/uploader.rs:101
No locals.
#13 0x00007f0f5c31bf07 in datadog_php_profiling::profiling::Profiler::new::{{closure}} () at profiling/src/profiling/mod.rs:527
No locals.
#14 datadog_php_profiling::profiling::thread_utils::spawn::{{closure}} () at profiling/src/profiling/thread_utils.rs:38
No locals.
#15 std::sys_common::backtrace::__rust_begin_short_backtrace () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135
No locals.
#16 0x00007f0f5c33019d in std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/mod.rs:529
No locals.
#17 <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panic/unwind_safe.rs:271
No locals.
#18 std::panicking::try::do_call () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500
No locals.
#19 std::panicking::try () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464
No locals.
#20 std::panic::catch_unwind () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142
--Type <RET> for more, q to quit, c to continue without paging--
No locals.
#21 std::thread::Builder::spawn_unchecked_::{{closure}} () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/mod.rs:528
No locals.
#22 core::ops::function::FnOnce::call_once{{vtable-shim}} () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250
No locals.
#23 0x00007f0f5c557c66 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:1985
No locals.
#24 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:1985
No locals.
#25 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:108
No locals.
#26 0x00007f0f6167f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139703925856000, -8450623212753878221, 140726012776414, 140726012776415, 140726012776608, 139703925853440, 8405950328496882483, 8405854604155192115}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#27 0x00007f0f617b9353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 3 (Thread 0x7f0f5affe700 (LWP 327)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x00007f0f5c5573a4 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62
No locals.
#2  0x00007f0f5c54d9fc in std::sys_common::thread_parking::futex::Parker::park_timeout () at library/std/src/sys_common/thread_parking/futex.rs:72
No locals.
#3  std::thread::park_timeout () at library/std/src/thread/mod.rs:1045
No locals.
#4  0x00007f0f5c30a0ce in crossbeam_channel::context::Context::wait_until () at src/context.rs:168
No locals.
#5  crossbeam_channel::select::run_select::{{closure}} () at src/select.rs:260
No locals.
#6  crossbeam_channel::context::Context::with::{{closure}} () at src/context.rs:52
No locals.
#7  0x00007f0f5c30a420 in crossbeam_channel::context::Context::with::{{closure}} () at src/context.rs:60
No locals.
#8  std::thread::local::LocalKey<T>::try_with () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/local.rs:270
No locals.
#9  crossbeam_channel::context::Context::with () at src/context.rs:55
No locals.
#10 crossbeam_channel::select::run_select () at src/select.rs:212
No locals.
--Type <RET> for more, q to quit, c to continue without paging--
#11 0x00007f0f5c334e89 in crossbeam_channel::select::select () at /rust/cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-channel-0.5.8/src/select.rs:474
No locals.
#12 datadog_php_profiling::profiling::TimeCollector::run () at profiling/src/profiling/mod.rs:430
No locals.
#13 0x00007f0f5c31c127 in datadog_php_profiling::profiling::Profiler::new::{{closure}} () at profiling/src/profiling/mod.rs:523
No locals.
#14 datadog_php_profiling::profiling::thread_utils::spawn::{{closure}} () at profiling/src/profiling/thread_utils.rs:38
No locals.
#15 std::sys_common::backtrace::__rust_begin_short_backtrace () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135
No locals.
#16 0x00007f0f5c33046d in std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/mod.rs:529
No locals.
#17 <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panic/unwind_safe.rs:271
No locals.
#18 std::panicking::try::do_call () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500
No locals.
#19 std::panicking::try () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464
No locals.
#20 std::panic::catch_unwind () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142
No locals.
#21 std::thread::Builder::spawn_unchecked_::{{closure}} () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/mod.rs:528
No locals.
#22 core::ops::function::FnOnce::call_once{{vtable-shim}} () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250
No locals.
#23 0x00007f0f5c557c66 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:1985
No locals.
#24 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:1985
No locals.
#25 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:108
No locals.
#26 0x00007f0f6167f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139703927957248, -8450623212753878221, 140726012776398, 140726012776399, 140726012776592, 139703927954688, 8405950054155846451, 8405854604155192115}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#27 0x00007f0f617b9353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 2 (Thread 0x7f0f5b9ff700 (LWP 326)):
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f0f5b9fea90, clockid=<optimized out>, expected=0, futex_word=0x562c80d34010) at ../sysdeps/nptl/futex-internal.h:320
        __ret = -516
        clockbit = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        op = <optimized out>
        __ret = <optimized out>
        oldtype = 0
        err = <optimized out>
        oldtype = <optimized out>
        err = <optimized out>
        __ret = <optimized out>
        clockbit = <optimized out>
        op = <optimized out>
        __ret = <optimized out>
        resultvar = <optimized out>
        __arg6 = <optimized out>
        __arg5 = <optimized out>
        __arg4 = <optimized out>
        __arg3 = <optimized out>
        __arg2 = <optimized out>
        __arg1 = <optimized out>
        _a6 = <optimized out>
        _a5 = <optimized out>
        _a4 = <optimized out>
        _a3 = <optimized out>
        _a2 = <optimized out>
        _a1 = <optimized out>
#1  __pthread_cond_wait_common (abstime=0x7f0f5b9fea90, clockid=<optimized out>, mutex=0x562c80d33f18, cond=0x562c80d33fe8) at pthread_cond_wait.c:520
        spin = 0
        buffer = {__routine = 0x7f0f61686050 <__condvar_cleanup_waiting>, __arg = 0x7f0f5b9fe9d0, __canceltype = 1590350032, __prev = 0x0}
        cbuffer = {wseq = 334, cond = 0x562c80d33fe8, mutex = 0x562c80d33f18, private = 0}
        err = <optimized out>
        g = 0
        flags = <optimized out>
        g1_start = <optimized out>
        maxspin = 0
        signals = <optimized out>
        result = 0
        wseq = 334
        seq = 167
        private = <optimized out>
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        g1_start = <optimized out>
        spin = <optimized out>
        buffer = <optimized out>
        cbuffer = <optimized out>
        s = <optimized out>
#2  __pthread_cond_timedwait (cond=0x562c80d33fe8, mutex=0x562c80d33f18, abstime=0x7f0f5b9fea90) at pthread_cond_wait.c:665
        flags = <optimized out>
        clockid = <optimized out>
#3  0x00007f0f5ca51178 in _dd_writer_loop (_=<optimized out>) at /home/circleci/datadog/tmp/build_extension/ext/coms.c:986
        wait_deadline = {tv_sec = 1707909696, tv_nsec = 526932000}
        interval = <optimized out>
        stack = <optimized out>
        processed_stacks = <optimized out>
        curl = <optimized out>
        running = true
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {139703938443912, 8405948680481440563, 140726012777774, 140726012777775, 140726012777776, 139703938444544, 8405948680504509235, 8405937626107642675}, __mask_was_saved = 0}}, __pad = {0x7f0f5b9fec50, 0x0, 0x0, 0x0}}
        __cancel_routine = 0x7f0f5ca4fe40 <_dd_writer_loop_cleanup>
        __cancel_arg = <optimized out>
        __not_first_call = <optimized out>
        sigset = {__val = {67127815, 0 <repeats 15 times>}}
        writer = 0x7f0f5d0d1620 <global_writer>
        caphdrp = {version = 537396514, pid = 0}
        capdatap = {{effective = 0, permitted = 0, inheritable = 0}, {effective = 0, permitted = 0, inheritable = 0}}
#4  0x00007f0f6167f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139703938447104, -8450623212753878221, 140726012777774, 140726012777775, 140726012777776, 139703938444544, 8405948680303182643, 8405854604155192115}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f0f617b9353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 1 (Thread 0x7f0f5f6375c0 (LWP 191)):
#0  0x0000562c7f4fb5cb in ?? ()
No symbol table info available.
#1  0x00007f0f5f5de480 in ?? ()
No symbol table info available.
#2  0x00007f0f58c33178 in ?? ()
No symbol table info available.
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x0000000000000308 in ?? ()
No symbol table info available.
#4  0x0000562c82366968 in ?? ()
No symbol table info available.
#5  0x00007f0f58c33178 in ?? ()
No symbol table info available.
#6  0x0000562c910a5d2c in ?? ()
No symbol table info available.
#7  0x00007f0f58a82a10 in ?? ()
No symbol table info available.
#8  0x0000000000000001 in ?? ()
No symbol table info available.
#9  0x000000035f402eb0 in ?? ()
No symbol table info available.
#10 0x00007ffd540074b0 in ?? ()
No symbol table info available.
#11 0x00007f0f58c583a0 in ?? ()
No symbol table info available.
#12 0x0000000000000001 in ?? ()
No symbol table info available.
#13 0x00007f0f5f5de480 in ?? ()
No symbol table info available.
#14 0x00007f0f00000001 in ?? ()
No symbol table info available.
#15 0x0000000000000002 in ?? ()
No symbol table info available.
#16 0x0000000000000001 in ?? ()
No symbol table info available.
#17 0x00007f0f5f4980d8 in ?? ()
No symbol table info available.
#18 0x0000562c7f582c81 in ?? ()
No symbol table info available.
#19 0x00007f0f5f413020 in ?? ()
No symbol table info available.
#20 0x9cddec2256084100 in ?? ()
No symbol table info available.
#21 0x00007f0f5f413020 in ?? ()
No symbol table info available.
#22 0x00007f0f5f466a00 in ?? ()
No symbol table info available.
#23 0x00007f0f5f413020 in ?? ()
No symbol table info available.
#24 0x0000562c7f776900 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
No symbol table info available.
#25 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)
amberovsky commented 6 months ago

I can upload the coredump itself if there is a secure way of doing so (E.g. if I can msg you the s3 path?)

bwoebi commented 6 months ago

Hm, the crashing thread is thread #1 and it's full of question marks. There are three options: a) you don't have debug symbols for PHP itself b) it's JIT code and it crashes in JIT c) It's a crash in the rust code.

For c) you could try https://output.circle-artifacts.com/output/job/280b1883-9ac5-42e7-b9fb-6c19b6101509/artifacts/0/datadog-setup.php - we have now debug symbols in our rust code too. If a), you may try to install them. If b) it's going to be hard...

realFlowControl commented 4 months ago

Hey there @amberovsky,

did you have a chance to try the suggestions?

You may also reach out to us in private via https://chat.datadoghq.com/ and find me as https://datadoghq.slack.com/team/U04MJBH48A2 so we can try and help you debug the crash

Kind regards Florian

amberovsky commented 3 months ago

Hi @realFlowControl , I'm testing on php 8.3 and the latest plugin version now.