feat(llmobs): submit openai embedding spans

Yun-Kim commented 3 days ago

This PR adds instrumentation to submit openai embedding spans to LLM Observability. The embedding spans sent to LLM Observability will contain the following I/O data:

metadata: captures the value of encoding_format and dimensions (when applicable/provided)
input.documents: captures each embedding input as a Document dictionary containing a text field
output.value: instead of unnecessarily storing very long vector values, adds a placeholder text [X embeddings returned with size Y] (if returned in base64 format, we do not mention the size as it is not trivial to determine from the output)

Note: we currently store embedding inputs as input.documents (storing as text-only Documents). For single input cases this is fine, but however the backend and UI currently default to concatenating multi-inputs into a single input.value string which does not result in the greatest display (non-JSON object). This issue can be fixed in the frontend.

Checklist

[x] Change(s) are motivated and described in the PR description
[x] Testing strategy is described if automated tests are not included in the PR
[x] Risks are described (performance impact, potential for breakage, maintainability)
[x] Change is maintainable (easy to change, telemetry, documentation)
[x] Library release note guidelines are followed or label changelog/no-changelog is set
[x] Documentation is included (in-code, generated user docs, public corp docs)
[x] Backport labels are set (if applicable)
[x] If this PR changes the public interface, I've notified @DataDog/apm-tees.

Reviewer Checklist

[x] Title is accurate
[x] All changes are related to the pull request's stated goal
[x] Description motivates each change
[x] Avoids breaking API changes
[x] Testing strategy adequately addresses listed risks
[x] Change is maintainable (easy to change, telemetry, documentation)
[x] Release note makes sense to a user of the library
[x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
[x] Backport labels are set in a manner that is consistent with the release branch maintenance policy

datadog-dd-trace-py-rkomorn[bot] commented 3 days ago

Datadog Report

Branch report: yunkim/llmobs-openai-embeddings Commit report: 5b100ea Test service: dd-trace-py

:x: 24 Failed (3 Known Flaky), 2269 Passed, 38039 Skipped, 53m 12.49s Total duration (35m 43.15s time saved)

:x: Failed Tests (24)

This report shows up to 5 failed tests.

test_embedding_array_of_token_arrays[ddtrace_global_config0] - test_openai_llmobs.py - Details

<details>
<summary>Expand for error</summary>

```
expected call not found.
Expected: enqueue({'span_id': '2508543017903380100', 'trace_id': '6682c26e00000000a5ef365d54aec50b', 'parent_id': 'undefined', 'session_id': '6682c26e00000000a5ef365d54aec50b', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:<ml-app-name>', 'session_id:6682c26e00000000a5ef365d54aec50b', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845486809183393, 'duration': 64067652, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'metadata': {'encoding_format': 'float'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai'}, 'metrics': {'prompt_tokens': 9, 'completion_tokens': 0, 'total_tokens': 9}})
Actual: enqueue({'trace_id': '6682c26e00000000a5ef365d54aec50b', 'span_id': '2508543017903380100', 'parent_id': 'undefined', 'session_id': '6682c26e00000000a5ef365d54aec50b', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:<ml-app-name>', 'session_id:6682c26e00000000a5ef365d54aec50b', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845486809183393, 'duration': 64067652, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai', 'metadata': {'encoding_format': 'float'}}, 'metrics': {'input_tokens': 9, 'output_tokens': 0, 'total_tokens': 9}})
```
</details>

test_embedding_array_of_token_arrays[ddtrace_global_config0] - test_openai_llmobs.py - Details

Expand for error
``` expected call not found. Expected: enqueue({'span_id': '14302294347734447073', 'trace_id': '6682c27500000000216e102ee5a85c4c', 'parent_id': 'undefined', 'session_id': '6682c27500000000216e102ee5a85c4c', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:', 'session_id:6682c27500000000216e102ee5a85c4c', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845493750583293, 'duration': 51325317, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'metadata': {'encoding_format': 'float'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai'}, 'metrics': {'prompt_tokens': 9, 'completion_tokens': 0, 'total_tokens': 9}}) Actual: enqueue({'trace_id': '6682c27500000000216e102ee5a85c4c', 'span_id': '14302294347734447073', 'parent_id': 'undefined', 'session_id': '6682c27500000000216e102ee5a85c4c', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:', 'session_id:6682c27500000000216e102ee5a85c4c', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845493750583293, 'duration': 51325317, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai', 'metadata': {'encoding_format': 'float'}}, 'metrics': {'input_tokens': 9, 'output_tokens': 0, 'total_tokens': 9}}) ```
test_embedding_array_of_token_arrays[ddtrace_global_config0] - test_openai_llmobs.py - Details

Expand for error
``` expected call not found. Expected: enqueue({'span_id': '7987563548371241238', 'trace_id': '6682c24c0000000099095f1fa049ab8e', 'parent_id': 'undefined', 'session_id': '6682c24c0000000099095f1fa049ab8e', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:', 'session_id:6682c24c0000000099095f1fa049ab8e', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845452702972867, 'duration': 81019236, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'metadata': {'encoding_format': 'float'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai'}, 'metrics': {'prompt_tokens': 9, 'completion_tokens': 0, 'total_tokens': 9}}) Actual: enqueue({'trace_id': '6682c24c0000000099095f1fa049ab8e', 'span_id': '7987563548371241238', 'parent_id': 'undefined', 'session_id': '6682c24c0000000099095f1fa049ab8e', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:', 'session_id:6682c24c0000000099095f1fa049ab8e', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845452702972867, 'duration': 81019236, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai', 'metadata': {'encoding_format': 'float'}}, 'metrics': {'input_tokens': 9, 'output_tokens': 0, 'total_tokens': 9}}) ```
test_embedding_array_of_token_arrays[ddtrace_global_config0] - test_openai_llmobs.py
test_embedding_array_of_token_arrays[ddtrace_global_config0] - test_openai_llmobs.py - Details

Expand for error
``` expected call not found. Expected: enqueue({'span_id': '2462979289525011806', 'trace_id': '6682c2d3000000002530e7d239238eb4', 'parent_id': 'undefined', 'session_id': '6682c2d3000000002530e7d239238eb4', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:', 'session_id:6682c2d3000000002530e7d239238eb4', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845587622497830, 'duration': 66327936, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'metadata': {'encoding_format': 'float'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai'}, 'metrics': {'prompt_tokens': 9, 'completion_tokens': 0, 'total_tokens': 9}}) Actual: enqueue({'trace_id': '6682c2d3000000002530e7d239238eb4', 'span_id': '2462979289525011806', 'parent_id': 'undefined', 'session_id': '6682c2d3000000002530e7d239238eb4', 'name': 'openai.request', 'tags': ['version:', 'env:', 'service:', 'source:integration', 'ml_app:', 'session_id:6682c2d3000000002530e7d239238eb4', 'ddtrace.version:2.11.0.dev66+g5b100ea48', 'error:0'], 'start_ns': 1719845587622497830, 'duration': 66327936, 'status': 'ok', 'meta': {'span.kind': 'embedding', 'input': {'documents': [{'text': '[1111, 2222, 3333]'}, {'text': '[4444, 5555, 6666]'}, {'text': '[7777, 8888, 9999]'}]}, 'output': {'value': '[3 embedding(s) returned with size 1536]'}, 'model_name': 'text-embedding-ada-002-v2', 'model_provider': 'openai', 'metadata': {'encoding_format': 'float'}}, 'metrics': {'input_tokens': 9, 'output_tokens': 0, 'total_tokens': 9}}) ```

pr-commenter[bot] commented 3 days ago

Benchmarks

Benchmark execution time: 2024-06-28 21:22:33

Comparing candidate commit e3315c7361a479800ad3c3df53f1720604f457c2 in PR branch yunkim/llmobs-openai-embeddings with baseline commit 9c9b5a7e9d5977438cb98a6d86cf00b23f7fa3ff in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 221 metrics, 9 unstable metrics.

DataDog / dd-trace-py