DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
506 stars 397 forks source link

fix(futures): fix incorrect context propgation with ThreadPoolExecutor [backport 2.9] #9603

Closed github-actions[bot] closed 1 week ago

github-actions[bot] commented 1 week ago

Backport 109ba082a162b7dd7eac984197a2bf12c93136af from #9588 to 2.9.

There is a bug when scheduling work onto a ThreadPoolExecutor and not waiting for the response (e.g. pool.submit(work), and ignoring the future) we not properly associate the spans created in the task with the trace that was active when submitting the task.

The reason for this bug is because we propagate the currently active span (parent) to the child task, however, if the parent span finishes before the child task can create it's first span, we no longer consider the parent span active/available to inherit from. This is because our context management code does not work if passing spans between thread or process boundaries.

The solution is to instead pass the active span's Context to the child task. This is a similar process as passing context between two services/processes via HTTP headers (for example). This change will allow the child task's spans to be properly associated with the parent span regardless of the execution order.

This issue can be highlighted by the following example:


pool = ThreadPoolExecutor(max_workers=1)

def task():
    parent_span = tracer.current_span()
    assert parent_span is not None
    time.sleep(1)

with tracer.trace("parent"):
    for _ in range(10):
        pool.submit(task)

The first execution of task will (probably) succeed without any issues because the parent span is likely still active at that time. However, when each additional task executes the assertion will fail because the parent span is no longer an active span so tracer.current_span() will return None.

This example shows that only the first execution of task will be properly associated with the parent span/trace, the other calls to task will be disconnected traces.

This fix will resolve this inconsistent and unexpected behavior to ensure that the spans created in task will always be properly associated with the parent span/trace.

This change may impact people who were expecting to access the current span in the child task, but before creating any spans in the child task (the code sample above), as the span will no longer be available via tracer.current_span().

Checklist

Reviewer Checklist

datadog-dd-trace-py-rkomorn[bot] commented 1 week ago

Datadog Report

Branch report: backport-9588-to-2.9 Commit report: 0bbf7c5 Test service: dd-trace-py

:white_check_mark: 0 Failed, 906 Passed, 0 Skipped, 5m 24.25s Total Time

pr-commenter[bot] commented 1 week ago

Benchmarks

Benchmark execution time: 2024-06-20 21:53:23

Comparing candidate commit 0bbf7c502a5eb5df7f486e459c6573730c0e2d0a in PR branch backport-9588-to-2.9 with baseline commit 8c934bbdcf9cb999504f5a956aa68146ad75f7ac in branch 2.9.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 209 metrics, 9 unstable metrics.