DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
506 stars 397 forks source link

feat(llmobs): capture input args and resp in function decorators #9604

Open Yun-Kim opened 1 week ago

Yun-Kim commented 1 week ago

This PR introduces automatic input/output annotation for task/tool/workflow/agent/retrieval spans created using function decorators. Specifically, the input args/kwargs provided to a traced function will be captured as a dictionary of key-value pairs, and the return value(s) of the function will also be captured as a JSON-serialized object (or a tuple of JSON-serialized objects if multiple return values).

Example

@workflow
def traced_workflow(prompt, arg_2, kwarg_1=None):
    formatted_output = ...
    return formatted_output
Screenshot 2024-06-25 at 3 30 05 PM

Limitations / Future Steps

There are 2 limitations/special cases introduced by this PR:

  1. If a user manually annotates the span I/O, auto-annotation will be overwritten. This is to avoid complications with data merging auto/manual annotations due to the different types of I/O data that can be annotated manually. Future steps include resolving this behavior by auto-merging auto/manual annotations when possible.

  2. This PR does not include auto annotation for LLM/embedding spans, or output annotation for retrieval spans, due to the specialized I/O that those span kinds have which raises a whole new can of worms on how to automatically store/format I/O for those span kinds based on the traced function signature. Future steps include figuring out how we can automate annotation for more specialized I/O cases such as LLM/embedding/retrieval spans.

Notes

Additionally, this PR adds a private option to each decorator to disable automatic annotation (not public) in case users want to manually annotate their own I/O to the function decorator span. Otherwise, automatic annotation will override any manual annotation inside the function.

Checklist

Reviewer Checklist

pr-commenter[bot] commented 1 week ago

Benchmarks

Benchmark execution time: 2024-07-01 15:34:14

Comparing candidate commit f03a8487b47f36b9e0cd8ac90a721469240e01fb in PR branch yunkim/llmobs-decorators-extract-input-output with baseline commit 9c9b5a7e9d5977438cb98a6d86cf00b23f7fa3ff in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 221 metrics, 9 unstable metrics.

datadog-dd-trace-py-rkomorn[bot] commented 1 week ago

Datadog Report

Branch report: yunkim/llmobs-decorators-extract-input-output Commit report: 4ede64c Test service: dd-trace-py

:white_check_mark: 0 Failed, 9354 Passed, 30989 Skipped, 2h 42m 34.85s Total duration (6m 45.47s time saved)