Open morrisonlevi opened 2 weeks ago
Benchmark execution time: 2024-06-28 02:26:52
Comparing candidate commit 6d1ae69cf96f4d31fbcdb0ecf5e04988af578504 in PR branch levi/vm_stack_top
with baseline commit 44a07ba98de2062986a33cc1592a2b87533441a0 in branch master
.
Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 9 unstable metrics.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 79.52%. Comparing base (
44a07ba
) to head (6d1ae69
).
It took quite a few randomized test runs but I managed to get a crash. Here's the top of the stack:
#0 datadog_php_profiling::profiling::stack_walking::detail::StringCache::get_or_insert (slot=0, f=..., self=<optimized out>)
at profiling/src/profiling/stack_walking.rs:103
#1 datadog_php_profiling::profiling::stack_walking::detail::handle_function_cache_slot (func=0x7fbf16da3600, string_cache=<optimized out>)
at profiling/src/profiling/stack_walking.rs:284
#2 datadog_php_profiling::profiling::stack_walking::detail::collect_call_frame (string_set=0x565364b5dba8, execute_data=<optimized out>)
at profiling/src/profiling/stack_walking.rs:242
#3 datadog_php_profiling::profiling::stack_walking::detail::collect_stack_sample::{{closure}} (cell=<optimized out>) at profiling/src/profiling/stack_walking.rs:202
#4 std::thread::local::LocalKey<T>::try_with (self=<optimized out>, f=...) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/local.rs:270
#5 std::thread::local::LocalKey<T>::with (f=...) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/local.rs:246
#6 datadog_php_profiling::profiling::stack_walking::detail::collect_stack_sample (top_execute_data=<optimized out>) at profiling/src/profiling/stack_walking.rs:195
#7 0x00007fbf1d960303 in datadog_php_profiling::profiling::Profiler::collect_time (self=0x7fbf1deee698 <datadog_php_profiling::PROFILER+8>,
execute_data=0x23880000, interrupt_count=537535084) at profiling/src/profiling/mod.rs:693
#8 0x00007fbf1d961a1c in datadog_php_profiling::wall_time::ddog_php_prof_interrupt_function::{{closure}} (cell=<optimized out>) at profiling/src/wall_time.rs:76
#9 std::thread::local::LocalKey<T>::try_with (self=<optimized out>, f=...) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/local.rs:270
#10 std::thread::local::LocalKey<T>::with (f=...) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/thread/local.rs:246
#11 datadog_php_profiling::wall_time::ddog_php_prof_interrupt_function (execute_data=0x7fbf1fc14090) at profiling/src/wall_time.rs:36
#12 0x00007fbf1eddc0fc in dd_uhook_end (invocation=44259, execute_data=0x7fbf1fc14090, retval=0x7ffde6a66b10, auxiliary=0x7fbf1fdb5550, dynamic=0x7fbf16aab1c0)
at /home/circleci/datadog/tmp/build_extension/ext/hook/uhook.c:383
Basically, the tracer calls the profiler's interrupt function before calling a closure in an end hook. The profiler crashes when looking at the run time cache of the "risky" frame.
Havnen't yet identified why the run time cache is invalid.
PROF-9982
Description
Background and Motivation
For wall- and CPU- time, we walk the call stack in the VM interrupt handler. Unfortunately, if an internal function call is the leaf frame at the time the interrupt was triggered, it gets popped off the call stack before the VM interrupt handler is triggered. To work around this, we add the zend_execute_internal hook and manually call the routine to collect the call stack while the internal frame was still on the call stack.
Using the
zend_execute_internal
hook has performance costs today. These costs are made even worse on the upcoming PHP 8.4 release because it adds a new optimization for frameless calls, and settingzend_execute_internal
disables this optimization.The goal of this task is to avoid installing this
zend_execute_internal
hook, while not regressing on performance, and without losing functionality--we definitely need that leaf frame. A lot of functions show up there, like regular expression and database related functions.Implementation
This does some... uh... let's say "clever" things to avoid using the
zend_execute_internal
hook. In thread B which triggers the interrupt, it also saves the current value of thread A'sEG(current_execute_data)
. Then when thread A handles its interrupt, it compares that againstEG(vm_stack_top)
. If they are equal, then that's used as the top of the stack instead of theexecute_data
passed to the interrupt handler by the engine, and otherwise we just useexecute_data
as normal.Reviewer checklist