feat(llmobs): capture input args and resp in function decorators

Yun-Kim commented 1 week ago

This PR introduces automatic input/output annotation for task/tool/workflow/agent/retrieval spans created using function decorators. Specifically, the input args/kwargs provided to a traced function will be captured as a dictionary of key-value pairs, and the return value(s) of the function will also be captured as a JSON-serialized object (or a tuple of JSON-serialized objects if multiple return values).

Example

@workflow
def traced_workflow(prompt, arg_2, kwarg_1=None):
    formatted_output = ...
    return formatted_output

Limitations / Future Steps

There are 2 limitations/special cases introduced by this PR:

If a user manually annotates the span I/O, auto-annotation will be overwritten. This is to avoid complications with data merging auto/manual annotations due to the different types of I/O data that can be annotated manually. Future steps include resolving this behavior by auto-merging auto/manual annotations when possible.
This PR does not include auto annotation for LLM/embedding spans, or output annotation for retrieval spans, due to the specialized I/O that those span kinds have which raises a whole new can of worms on how to automatically store/format I/O for those span kinds based on the traced function signature. Future steps include figuring out how we can automate annotation for more specialized I/O cases such as LLM/embedding/retrieval spans.

Notes

Additionally, this PR adds a private option to each decorator to disable automatic annotation (not public) in case users want to manually annotate their own I/O to the function decorator span. Otherwise, automatic annotation will override any manual annotation inside the function.

Checklist

[x] Change(s) are motivated and described in the PR description
[x] Testing strategy is described if automated tests are not included in the PR
[x] Risks are described (performance impact, potential for breakage, maintainability)
[x] Change is maintainable (easy to change, telemetry, documentation)
[x] Library release note guidelines are followed or label changelog/no-changelog is set
[x] Documentation is included (in-code, generated user docs, public corp docs)
[x] Backport labels are set (if applicable)
[x] If this PR changes the public interface, I've notified @DataDog/apm-tees.

Reviewer Checklist

[x] Title is accurate
[x] All changes are related to the pull request's stated goal
[x] Description motivates each change
[x] Avoids breaking API changes
[x] Testing strategy adequately addresses listed risks
[x] Change is maintainable (easy to change, telemetry, documentation)
[x] Release note makes sense to a user of the library
[x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
[x] Backport labels are set in a manner that is consistent with the release branch maintenance policy

pr-commenter[bot] commented 1 week ago

Benchmarks

Benchmark execution time: 2024-07-01 15:34:14

Comparing candidate commit f03a8487b47f36b9e0cd8ac90a721469240e01fb in PR branch yunkim/llmobs-decorators-extract-input-output with baseline commit 9c9b5a7e9d5977438cb98a6d86cf00b23f7fa3ff in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 221 metrics, 9 unstable metrics.

datadog-dd-trace-py-rkomorn[bot] commented 1 week ago

Datadog Report

Branch report: yunkim/llmobs-decorators-extract-input-output Commit report: 4ede64c Test service: dd-trace-py

:white_check_mark: 0 Failed, 9354 Passed, 30989 Skipped, 2h 42m 34.85s Total duration (6m 45.47s time saved)

DataDog / dd-trace-py