DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
552 stars 416 forks source link

chore(langchain): disable flaky tests #11511

Open Kyle-Verhoog opened 12 hours ago

Kyle-Verhoog commented 12 hours ago

There appear to be stability issues with using snapshots and/or LangChain in general.

There are failures in the mocked tests that look like:

builtins.AssertionError: assert 0 == 1
 +  where 0 = <MagicMock name='LLMObsSpanWriter().enqueue' id='127482146130048'>.call_count
 +    where <MagicMock name='LLMObsSpanWriter().enqueue' id='127482146130048'> = <MagicMock name='LLMObsSpanWriter()' id='127482147073440'>.enqueue

as well as failures with snapshot based tests:

builtins.Failed: At request <Request GET /test/session/snapshot >:
   At snapshot (token='tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple'):
    - Directory: /go/src/github.com/DataDog/apm-reliability/dd-trace-py/tests/snapshots
    - CI mode: 1
    - Trace File: /go/src/github.com/DataDog/apm-reliability/dd-trace-py/tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple.json
    - Stats File: /go/src/github.com/DataDog/apm-reliability/dd-trace-py/tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple_tracestats.json
    At compare of 1 expected trace(s) to 0 received trace(s):
Did not receive expected traces: 'langchain.request'

While we investigate a more stable method of testing it makes sense to disable the tests to avoid noise to our neighbours in the library :).

DOWN WITH FLAKY TESTS

Checklist

Reviewer Checklist

github-actions[bot] commented 11 hours ago

CODEOWNERS have been resolved as:

tests/contrib/langchain/test_langchain_community.py                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability
pr-commenter[bot] commented 11 hours ago

Benchmarks

Benchmark execution time: 2024-11-22 19:41:32

Comparing candidate commit 971dad7b34d58e3d2d55ee3fb17e5891693a4630 in PR branch kylev/flaky-tests-should-be-shot-into-the-sun with baseline commit d792c3dc3c7452ed64524ea38d0b9c9116330a73 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 388 metrics, 2 unstable metrics.