coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

TPC-H results are incorrect #1344

Closed hendrikmakait closed 7 months ago

hendrikmakait commented 7 months ago

It looks like we've hit a severe regression in dask-expr: A number of queries returns wrong results or does not even finish anymore.

First occurrence: #1341

Note There's a genuine issue with the benchmarks in there as well:

asyncio.exceptions.TimeoutError: Waited for 4 worker(s) to reconnect after restarting, but after 120s, only 0 have returned. Consider a longer timeout, or `wait_for_workers=False`.

cc @phofl

Tracebacks

FAILED tests/tpch/test_correctness.py::test_dask_results[12] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (7, 3)
[right]: (2, 3)
FAILED tests/tpch/test_optimization.py::test_optimization[18] - KeyError: False
FAILED tests/tpch/test_dask.py::test_query_18 - KeyError: False
FAILED tests/tpch/test_correctness.py::test_dask_results[13] - Failed: Timeout >3600.0s
FAILED tests/tpch/test_correctness.py::test_dask_results[16] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (114161, 4)
[right]: (18314, 4)
FAILED tests/tpch/test_correctness.py::test_dask_results[17] - AssertionError: DataFrame.iloc[:, 0] (column name="avg_yearly") are different

DataFrame.iloc[:, 0] (column name="avg_yearly") values are different (100.0 %)
[index]: [0]
[left]:  [33330456.87]
[right]: [348406.0542857143]
At positional index 0, first diff: 33330456.87 != 348406.0542857143
FAILED tests/tpch/test_correctness.py::test_dask_results[18] - KeyError: False
FAILED tests/tpch/test_correctness.py::test_dask_results[21] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (0, 2)
[right]: (100, 2)
FAILED tests/tpch/test_correctness.py::test_dask_results[22] - AssertionError: DataFrame.iloc[:, 1] (column name="numcust") are different

DataFrame.iloc[:, 1] (column name="numcust") values are different (100.0 %)
[index]: [0, 1, 2, 3, 4, 5, 6]
[left]:  [2680, 2642, 2779, 2697, 2835, 2653, 2739]
[right]: [888, 861, 964, 892, 948, 909, 922]
At positional index 0, first diff: 2680 != 888
ERROR tests/tpch/test_correctness.py::test_dask_results[6] - asyncio.exceptions.TimeoutError: Waited for 4 worker(s) to reconnect after restarting, but after 120s, only 0 have returned. Consider a longer timeout, or `wait_for_workers=False`.

https://github.com/coiled/benchmarks/actions/runs/7763077035/job/21174629659#step:12:1567

1342:

FAILED tests/tpch/test_correctness.py::test_dask_results[12] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (7, 3)
[right]: (2, 3)
FAILED tests/tpch/test_optimization.py::test_optimization[18] - KeyError: False
FAILED tests/tpch/test_correctness.py::test_dask_results[16] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (114161, 4)
[right]: (18314, 4)
FAILED tests/tpch/test_correctness.py::test_dask_results[17] - AssertionError: DataFrame.iloc[:, 0] (column name="avg_yearly") are different

DataFrame.iloc[:, 0] (column name="avg_yearly") values are different (100.0 %)
[index]: [0]
[left]:  [33330456.87]
[right]: [348406.0542857143]
At positional index 0, first diff: 33330456.87 != 348406.0542857143
FAILED tests/tpch/test_correctness.py::test_dask_results[18] - KeyError: False
FAILED tests/tpch/test_dask.py::test_query_18 - KeyError: False
FAILED tests/tpch/test_correctness.py::test_dask_results[21] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (0, 2)
[right]: (100, 2)
FAILED tests/tpch/test_correctness.py::test_dask_results[22] - AssertionError: DataFrame.iloc[:, 1] (column name="numcust") are different

DataFrame.iloc[:, 1] (column name="numcust") values are different (100.0 %)
[index]: [0, 1, 2, 3, 4, 5, 6]
[left]:  [2680, 2642, 2779, 2697, 2835, 2653, 2739]
[right]: [888, 861, 964, 892, 948, 909, 922]
At positional index 0, first diff: 2680 != 888
ERROR tests/tpch/test_correctness.py::test_dask_results[3] - asyncio.exceptions.TimeoutError: Waited for 4 worker(s) to reconnect after restarting, but after 120s, only 0 have returned. Consider a longer timeout, or `wait_for_workers=False`.
ERROR tests/tpch/test_correctness.py::test_dask_results[11] - asyncio.exceptions.TimeoutError: Waited for 4 worker(s) to reconnect after restarting, but after 120s, only 0 have returned. Consider a longer timeout, or `wait_for_workers=False`.

https://github.com/coiled/benchmarks/actions/runs/7770352324/job/21190438203#step:12:1536

1343:

FAILED tests/tpch/test_correctness.py::test_dask_results[12] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (7, 3)
[right]: (2, 3)
FAILED tests/tpch/test_optimization.py::test_optimization[18] - KeyError: False
FAILED tests/tpch/test_dask.py::test_query_18 - KeyError: False
FAILED tests/tpch/test_correctness.py::test_dask_results[13] - Failed: Timeout >3600.0s
FAILED tests/tpch/test_correctness.py::test_dask_results[16] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (114161, 4)
[right]: (18314, 4)
FAILED tests/tpch/test_correctness.py::test_dask_results[17] - AssertionError: DataFrame.iloc[:, 0] (column name="avg_yearly") are different

DataFrame.iloc[:, 0] (column name="avg_yearly") values are different (100.0 %)
[index]: [0]
[left]:  [33330456.87]
[right]: [348406.0542857143]
At positional index 0, first diff: 33330456.87 != 348406.0542857143
FAILED tests/tpch/test_correctness.py::test_dask_results[18] - KeyError: False
FAILED tests/tpch/test_correctness.py::test_dask_results[21] - AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (0, 2)
[right]: (100, 2)
FAILED tests/tpch/test_correctness.py::test_dask_results[22] - AssertionError: DataFrame.iloc[:, 1] (column name="numcust") are different

DataFrame.iloc[:, 1] (column name="numcust") values are different (100.0 %)
[index]: [0, 1, 2, 3, 4, 5, 6]
[left]:  [2680, 2642, 2779, 2697, 2835, 2653, 2739]
[right]: [888, 861, 964, 892, 948, 909, 922]
At positional index 0, first diff: 2680 != 888
ERROR tests/tpch/test_correctness.py::test_dask_results[6] - asyncio.exceptions.TimeoutError: Waited for 4 worker(s) to reconnect after restarting, but after 120s, only 0 have returned. Consider a longer timeout, or `wait_for_workers=False`.

https://github.com/coiled/benchmarks/actions/runs/7763077035/job/21174629659#step:12:1532

hendrikmakait commented 7 months ago

XREF: https://github.com/coiled/benchmarks/issues/1366

phofl commented 7 months ago

Query 21 should be addressed now, so closing for now