coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

[TPC-H] Switching to `scan_parquet` from `scan_pyarrow_dataset` caused Polars test failures #1396

Open hendrikmakait opened 7 months ago

hendrikmakait commented 7 months ago

In #1394, we switched to pl.scan_parquet to enable streaming. This has caused some of the TPC-H queries to fail.


FAILED tests/tpch/test_polars.py::test_query_3 - polars.exceptions.ComputeError: cannot sort column of dtype `binary[offset]`
FAILED tests/tpch/test_polars.py::test_query_7 - pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value
FAILED tests/tpch/test_polars.py::test_query_8 - polars.exceptions.InvalidOperationError: `year` operation not supported for dtype `i64`
FAILED tests/tpch/test_polars.py::test_query_9 - polars.exceptions.InvalidOperationError: `year` operation not supported for dtype `i64`
FAILED tests/tpch/test_polars.py::test_query_18 - polars.exceptions.ComputeError: cannot sort column of dtype `binary[offset]`
hendrikmakait commented 7 months ago

Note that all of these queries work if I disable streaming.

ritchie46 commented 7 months ago

Some of these bugs are due to our string refactor. Which polars version was this?

hendrikmakait commented 7 months ago

This is 0.20.8.

ritchie46 commented 7 months ago

Alright. I will take a look.

hendrikmakait commented 7 months ago

Thanks, let me know if I can help you with anything, e.g., by filing more detailed issues.

ritchie46 commented 7 months ago

Thanks, I will fix this. I am also on vacation, so give me a few days. :)

mrocklin commented 7 months ago

Thanks, I will fix this. I am also on vacation, so give me a few days. :)

Your idea of a vacation is different from most people I know 🙂

ritchie46 commented 7 months ago

Your idea of a vacation is different from most people I know 🙂

Yes, took a while before my GF accepted that. 😹