buildkite / test-collector-python

Python adapter for Buildkite Test Analytics
https://buildkite.com/test-analytics
MIT License
6 stars 5 forks source link

Duplicated test results when running tests in parallel with xdist #23

Closed adimoldovan closed 2 months ago

adimoldovan commented 1 year ago

Using buildkite-test-collector 0.1.7

When running tests in parallel using pytest xdist each test is reported twice in Test Analytics.

See screenshots below. Same test suite, first run sequentially, second run in parallel.

_bk_seq _bk_parallel
proteusvacuum commented 4 months ago

I'm also facing this issue.

gchan commented 4 months ago

Hi @adimoldovan and @proteusvacuum !

I had a look at this issue on our end and it seems the tests are being executed twice. The UI displays what looks to be duplicate test results but are actually different executions with different timings. You can see the duration column in your screenshot have different values.

May I ask you to confirm what level of parallelism you are using when running your tests? If you increase the parallelism, do you see more more duplicates?

May I ask what test distribution algorithm you're using when running xdist? I'm not too familiar with how xdist works but would like to rule out xdist being a contributor to what we are observing in the web UI.

Is there any logs you can share which show which the progress of a test run (and which tests are executed)?

cc @jimsynz

adimoldovan commented 4 months ago

@gchan It's been a while since I've had the issue and moved to another project with another stack since then. But I'm very sure the tests were not being executed twice, we would have caught that. The workaround we used was to publish the results using curl and the junit format report instead of using test-collector-python, and that worked fine for the same parallel runs.

I'll try to find a code sample and/or some logs.

joshprice commented 3 months ago

Looks like we need more information on which versions you were running @adimoldovan @proteusvacuum to see this issue. Any chance you could share any more details?

We tried to reproduce this with a simple Python project (https://gist.github.com/joshprice/864ca419c6e8f47000757977f3ff6db2) and couldn't get any duplicates to appear in the payload. This was with the following versions:

platform darwin -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0
plugins: buildkite-test-collector-0.1.7, xdist-3.6.1

This was run with various xdist configs (n=2, 3, 4) and all return payloads with exactly 4 results in them as expected. There is some debug output I've added into the collector to verify.

❯ BUILDKITE_ANALYTICS_DEBUG_ENABLED=TRUE BUILDKITE_ANALYTICS_TOKEN="xyz" CI=TRUE pytest -v --capture=no -n 4
************ env RuntimeEnvironment(ci='generic', key='6acc5b5c-5ced-42aa-a190-4fe2f0045286', number=None, job_id=None, branch=None, commit_sha=None, message=None, url=None)
========================================== test session starts ==========================================
platform darwin -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0 -- /Users/josh/dev/buildkite/test-collector-python/bkenv/bin/python3.12
cachedir: .pytest_cache
rootdir: /Users/josh/dev/buildkite/parallel-test-demo
configfile: pytest.ini
plugins: buildkite-test-collector-0.1.7, xdist-3.6.1
4 workers [4 items]
scheduling tests via LoadScheduling

test_second.py::test_subtract2
test_main.py::test_subtract
test_main.py::test_add
test_second.py::test_add2
TestData(id=UUID('57e4fc97-6c23-4db6-9d8e-e94ad2cc7fa1'), scope='test_second.py', name='test_subtract2', history=TestHistory(start_at=Instant(seconds=358753.305038791), end_at=None, duration=None, children=()), location='test_second.py:5', result=None)

[gw3] PASSED test_second.py::test_subtract2
TestData(id=UUID('9f6ed529-dbc9-4fae-99a7-519beb4a9f70'), scope='test_main.py', name='test_add', history=TestHistory(start_at=Instant(seconds=358753.305849833), end_at=None, duration=None, children=()), location='test_main.py:2', result=None)

[gw0] PASSED test_main.py::test_add
TestData(id=UUID('5ec2f76f-bde4-4625-a3f3-8990e241dab6'), scope='test_main.py', name='test_subtract', history=TestHistory(start_at=Instant(seconds=358753.305118875), end_at=None, duration=None, children=()), location='test_main.py:5', result=None)

[gw1] PASSED test_main.py::test_subtract
TestData(id=UUID('b9f630a8-3d98-4163-a73d-d1ffaa114dfc'), scope='test_second.py', name='test_add2', history=TestHistory(start_at=Instant(seconds=358753.305923916), end_at=None, duration=None, children=()), location='test_second.py:2', result=None)

[gw2] PASSED test_second.py::test_add2

=========================================== 4 passed in 0.56s ===========================================
DEBUG MODE ON TRUE
{ 'data': ( TestData(id=UUID('57e4fc97-6c23-4db6-9d8e-e94ad2cc7fa1'),
                     scope='test_second.py',
                     name='test_subtract2',
                     history=TestHistory(start_at=Instant(seconds=358753.305038791),
                                         end_at=Instant(seconds=358753.305987083),
                                         duration=datetime.timedelta(microseconds=948),
                                         children=()),
                     location='test_second.py:5',
                     result=TestResultPassed()),
            TestData(id=UUID('9f6ed529-dbc9-4fae-99a7-519beb4a9f70'),
                     scope='test_main.py',
                     name='test_add',
                     history=TestHistory(start_at=Instant(seconds=358753.305849833),
                                         end_at=Instant(seconds=358753.306073291),
                                         duration=datetime.timedelta(microseconds=223),
                                         children=()),
                     location='test_main.py:2',
                     result=TestResultPassed()),
            TestData(id=UUID('5ec2f76f-bde4-4625-a3f3-8990e241dab6'),
                     scope='test_main.py',
                     name='test_subtract',
                     history=TestHistory(start_at=Instant(seconds=358753.305118875),
                                         end_at=Instant(seconds=358753.306136958),
                                         duration=datetime.timedelta(microseconds=1018),
                                         children=()),
                     location='test_main.py:5',
                     result=TestResultPassed()),
            TestData(id=UUID('b9f630a8-3d98-4163-a73d-d1ffaa114dfc'),
                     scope='test_second.py',
                     name='test_add2',
                     history=TestHistory(start_at=Instant(seconds=358753.305923916),
                                         end_at=Instant(seconds=358753.306306083),
                                         duration=datetime.timedelta(microseconds=382),
                                         children=()),
                     location='test_second.py:2',
                     result=TestResultPassed())),
  'finished_at': None,
  'run_env': RuntimeEnvironment(ci='generic',
                                key='6acc5b5c-5ced-42aa-a190-4fe2f0045286',
                                number=None,
                                job_id=None,
                                branch=None,
                                commit_sha=None,
                                message=None,
                                url=None),
  'started_at': Instant(seconds=358753.3049935)}
adimoldovan commented 3 months ago

Hi @joshprice, thanks for checking this. I understand that BUILDKITE_ANALYTICS_DEBUG_ENABLED is not available in the released version?

With this project, every run with workers >0 results in duplicated test results in Test Suite. I could not find a way to check what the collector is actually sending to confirm it's a collector issue and not something on the Test Suites side.

Here's my output:

1 worker, build #31

Run python3 -m pipenv run pytest -n=1 -v --capture=no
  python3 -m pipenv run pytest -n=1 -v --capture=no
  shell: /usr/bin/bash -e {0}
  env:
    BUILDKITE_ANALYTICS_TOKEN: ***
    BUILDKITE_ANALYTICS_DEBUG_ENABLED: TRUE
    BUILDKITE_ANALYTICS_MESSAGE: Test run with 1 workers
    pythonLocation: /opt/hostedtoolcache/Python/3.10.14/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.14/x64/lib
    WORKERS: 1
============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.2.0 -- /home/runner/.local/share/virtualenvs/pytest-bk-test-analytics-IeC27NHc/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/pytest-bk-test-analytics/pytest-bk-test-analytics
configfile: pytest.ini
plugins: xdist-3.3.1, buildkite-test-collector-0.1.7
created: 1/1 worker
1 worker [2 items]

scheduling tests via LoadScheduling

tests/test_tests.py::TestClassOne::test_one 
[gw0] PASSED tests/test_tests.py::TestClassOne::test_one 
tests/test_tests.py::TestClassOne::test_two 
[gw0] PASSED tests/test_tests.py::TestClassOne::test_two 

- generated xml file: /home/runner/work/pytest-bk-test-analytics/pytest-bk-test-analytics/junit-results.xml -
============================== 2 passed in 0.78s ===============================
build-31

2 workers, build #32

Run python3 -m pipenv run pytest -n=2 -v --capture=no
  python3 -m pipenv run pytest -n=2 -v --capture=no
  shell: /usr/bin/bash -e {0}
  env:
    BUILDKITE_ANALYTICS_TOKEN: ***
    BUILDKITE_ANALYTICS_DEBUG_ENABLED: TRUE
    BUILDKITE_ANALYTICS_MESSAGE: Test run with 2 workers
    pythonLocation: /opt/hostedtoolcache/Python/3.10.14/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.14/x64/lib
    WORKERS: 2
============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.2.0 -- /home/runner/.local/share/virtualenvs/pytest-bk-test-analytics-IeC27NHc/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/pytest-bk-test-analytics/pytest-bk-test-analytics
configfile: pytest.ini
plugins: xdist-3.3.1, buildkite-test-collector-0.1.7
created: 2/2 workers
2 workers [2 items]

scheduling tests via LoadScheduling

tests/test_tests.py::TestClassOne::test_one 
tests/test_tests.py::TestClassOne::test_two 
[gw0] PASSED tests/test_tests.py::TestClassOne::test_one 
[gw1] PASSED tests/test_tests.py::TestClassOne::test_two 

- generated xml file: /home/runner/work/pytest-bk-test-analytics/pytest-bk-test-analytics/junit-results.xml -
============================== 2 passed in 1.00s ===============================
build-32

0 workers, build #33

Run python3 -m pipenv run pytest -n=0 -v --capture=no
  python3 -m pipenv run pytest -n=0 -v --capture=no
  shell: /usr/bin/bash -e {0}
  env:
    BUILDKITE_ANALYTICS_TOKEN: ***
    BUILDKITE_ANALYTICS_DEBUG_ENABLED: TRUE
    BUILDKITE_ANALYTICS_MESSAGE: Test run with 0 workers
    pythonLocation: /opt/hostedtoolcache/Python/3.10.14/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.14/x64/lib
    WORKERS: 0

============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.2.0 -- /home/runner/.local/share/virtualenvs/pytest-bk-test-analytics-IeC27NHc/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/pytest-bk-test-analytics/pytest-bk-test-analytics
configfile: pytest.ini
plugins: xdist-3.3.1, buildkite-test-collector-0.1.7
collecting ... collected 2 items

tests/test_tests.py::TestClassOne::test_one PASSED
tests/test_tests.py::TestClassOne::test_two PASSED

- generated xml file: /home/runner/work/pytest-bk-test-analytics/pytest-bk-test-analytics/junit-results.xml -
============================== 2 passed in 0.02s ===============================
build-33
joshprice commented 3 months ago

Thanks for the detailed report and test repo @adimoldovan!

I initially thought this may be a version issue, but on updating all your older versions to the latest I still saw the duplicates.

The duplicates weren't appearing in my tests because xdist takes over stdout (see https://github.com/pytest-dev/pytest-xdist/issues/354) which was a bit of a surprise.

Also each xdist worker runs the entire pytest lifecycle as well as the main thread, which explains why we were seeing n+1 run submissions on BKTA when run locally. Something slightly different happened when running in github actions - only one TA run but duplicate test results.

I've made a draft PR (#29) and you can reference this branch in your test repo to test it out:

buildkite-test-collector = { git = 'https://github.com/buildkite/test-collector-python', ref = 'xdist-duplicates' }
adimoldovan commented 3 months ago

Thank you @joshprice. I checked the PR and I can confirm it fixed the issue for me. 🙇🏼

jimsynz commented 2 months ago

Closed by #29