Polars queries 3 and 7 don't run in streaming mode

mrocklin commented 10 months ago

@ritchie46 can you confirm that this is true? I've just checked that we're using exactly the same as in https://github.com/pola-rs/tpch/blob/main/polars_queries/q3.py

If so, what would you like us to do here? Use non-streaming mode? Skip the query? Write it in some other way?

ritchie46 commented 10 months ago

Looking ar query 3, this should run in streaming mode? Do you get an error?

In non-streaming mode it should definitely run, but it requires the data to fit in memory. It of course depends on the machine.

mrocklin commented 10 months ago

Looking ar query 3, this should run in streaming mode? Do you get an error?

See below

    def test_query_3(run, restart, dataset_path):
        def _():
            var_1 = var_2 = datetime(1995, 3, 15)
            var_3 = "BUILDING"

            customer_ds = read_data(dataset_path + "customer")
            line_item_ds = read_data(dataset_path + "lineitem")
            orders_ds = read_data(dataset_path + "orders")

            (
                customer_ds.filter(pl.col("c_mktsegment") == var_3)
                .join(orders_ds, left_on="c_custkey", right_on="o_custkey")
                .join(line_item_ds, left_on="o_orderkey", right_on="l_orderkey")
                .filter(pl.col("o_orderdate") < var_2)
                .filter(pl.col("l_shipdate") > var_1)
                .with_columns(
                    (pl.col("l_extendedprice") * (1 - pl.col("l_discount"))).alias("revenue")
                )
                .group_by(["o_orderkey", "o_orderdate", "o_shippriority"])
                .agg([pl.sum("revenue")])
                .select(
                    [
                        pl.col("o_orderkey").alias("l_orderkey"),
                        "revenue",
                        "o_orderdate",
                        "o_shippriority",
                    ]
                )
                .sort(by=["revenue", "o_orderdate"], descending=[True, False])
                .limit(10)
            ).collect(streaming=True)

>       run(_)

tests/tpch/test_polars.py:149:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/tpch/conftest.py:342: in _run
    return function()
tests/tpch/test_polars.py:147: in _
    ).collect(streaming=True)
../../mambaforge/envs/test-env/lib/python3.11/site-packages/polars/utils/deprecation.py:100: in wrapper
    return function(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <LazyFrame [4 cols, {"l_orderkey": Int32 … "o_shippriority": Int32}] at 0x126913710>

    @deprecate_renamed_parameter(
        "common_subplan_elimination", "comm_subplan_elim", version="0.18.9"
    )
    def collect(
        self,
        *,
        type_coercion: bool = True,
        predicate_pushdown: bool = True,
        projection_pushdown: bool = True,
        simplify_expression: bool = True,
        slice_pushdown: bool = True,
        comm_subplan_elim: bool = True,
        comm_subexpr_elim: bool = True,
        no_optimization: bool = False,
        streaming: bool = False,
        _eager: bool = False,
    ) -> DataFrame:
        """
        Materialize this LazyFrame into a DataFrame.

        By default, all query optimizations are enabled. Individual optimizations may
        be disabled by setting the corresponding parameter to ``False``.

        Parameters
        ----------
        type_coercion
            Do type coercion optimization.
        predicate_pushdown
            Do predicate pushdown optimization.
        projection_pushdown
            Do projection pushdown optimization.
        simplify_expression
            Run simplify expressions optimization.
        slice_pushdown
            Slice pushdown optimization.
        comm_subplan_elim
            Will try to cache branching subplans that occur on self-joins or unions.
        comm_subexpr_elim
            Common subexpressions will be cached and reused.
        no_optimization
            Turn off (certain) optimizations.
        streaming
            Process the query in batches to handle larger-than-memory data.
            If set to ``False`` (default), the entire query is processed in a single
            batch.

            .. warning::
                This functionality is currently in an alpha state.

            .. note::
                Use :func:`explain` to see if Polars can process the query in streaming
                mode.

        Returns
        -------
        DataFrame

        See Also
        --------
        fetch: Run the query on the first `n` rows only for debugging purposes.
        explain : Print the query plan that is evaluated with collect.
        profile : Collect the LazyFrame and time each node in the computation graph.
        polars.collect_all : Collect multiple LazyFrames at the same time.
        polars.Config.set_streaming_chunk_size : Set the size of streaming batches.

        Examples
        --------
        >>> lf = pl.LazyFrame(
        ...     {
        ...         "a": ["a", "b", "a", "b", "b", "c"],
        ...         "b": [1, 2, 3, 4, 5, 6],
        ...         "c": [6, 5, 4, 3, 2, 1],
        ...     }
        ... )
        >>> lf.group_by("a").agg(pl.all().sum()).collect()  # doctest: +SKIP
        shape: (3, 3)
        ┌─────┬─────┬─────┐
        │ a   ┆ b   ┆ c   │
        │ --- ┆ --- ┆ --- │
        │ str ┆ i64 ┆ i64 │
        ╞═════╪═════╪═════╡
        │ a   ┆ 4   ┆ 10  │
        │ b   ┆ 11  ┆ 10  │
        │ c   ┆ 6   ┆ 1   │
        └─────┴─────┴─────┘

        Collect in streaming mode

        >>> lf.group_by("a").agg(pl.all().sum()).collect(
        ...     streaming=True
        ... )  # doctest: +SKIP
        shape: (3, 3)
        ┌─────┬─────┬─────┐
        │ a   ┆ b   ┆ c   │
        │ --- ┆ --- ┆ --- │
        │ str ┆ i64 ┆ i64 │
        ╞═════╪═════╪═════╡
        │ a   ┆ 4   ┆ 10  │
        │ b   ┆ 11  ┆ 10  │
        │ c   ┆ 6   ┆ 1   │
        └─────┴─────┴─────┘

        """
        if no_optimization or _eager:
            predicate_pushdown = False
            projection_pushdown = False
            slice_pushdown = False
            comm_subplan_elim = False
            comm_subexpr_elim = False

        if streaming:
            comm_subplan_elim = False

        ldf = self._ldf.optimization_toggle(
            type_coercion,
            predicate_pushdown,
            projection_pushdown,
            simplify_expression,
            slice_pushdown,
            comm_subplan_elim,
            comm_subexpr_elim,
            streaming,
            _eager,
        )
>       return wrap_df(ldf.collect())
E       pyo3_runtime.PanicException: not yet supported

In non-streaming mode it should definitely run, but it requires the data to fit in memory

Confirmed that this does run in non-streaming mode.

ritchie46 commented 10 months ago

Hmm.. interesting. I think this is a bug, hit by a dtype we don't support in streaming. (We should fallback to the non-streaming engine) if that happens. What backtrace do you got if you set RUST_BACKTRACE=1?

mrocklin commented 10 months ago

>       return wrap_df(ldf.collect())
E       pyo3_runtime.PanicException: not yet supported

../../mambaforge/envs/test-env/lib/python3.11/site-packages/polars/lazyframe/frame.py:1787: PanicException
-------------------------------------------------- Captured stderr call --------------------------------------------------
thread '<unnamed>' panicked at /Users/runner/work/polars/polars/crates/polars-row/src/decode.rs:44:5:
not yet supported
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: polars_row::decode::decode
   3: polars_row::decode::decode_rows
   4: polars_pipe::executors::sinks::sort::sink_multiple::finalize_dataframe
   5: <polars_pipe::executors::sinks::sort::sink_multiple::SortSinkMultiple as polars_pipe::operators::sink::Sink>::finalize
   6: polars_pipe::pipeline::dispatcher::PipeLine::run_pipeline_no_finalize
   7: polars_pipe::pipeline::dispatcher::PipeLine::run_pipeline
   8: <F as polars_plan::logical_plan::apply::DataFrameUdfMut>::call_udf
   9: polars_plan::logical_plan::functions::FunctionNode::evaluate
  10: <polars_lazy::physical_plan::executors::udf::UdfExec as polars_lazy::physical_plan::executors::executor::Executor>::execute
  11: polars_lazy::frame::LazyFrame::collect
  12: polars::lazyframe::_::<impl polars::lazyframe::PyLazyFrame>::__pymethod_collect__
  13: pyo3::impl_::trampoline::trampoline
  14: _method_vectorcall_NOARGS
  15: _PyObject_Vectorcall
  16: __PyEval_EvalFrameDefault
  17: __PyEval_Vector
  18: __PyVectorcall_Call
  19: __PyEval_EvalFrameDefault
  20: __PyEval_Vector
  21: __PyVectorcall_Call
  22: __PyEval_EvalFrameDefault
  23: __PyEval_Vector
  24: __PyEval_EvalFrameDefault
  25: __PyEval_Vector
  26: __PyObject_FastCallDictTstate
  27: __PyObject_Call_Prepend
  28: _slot_tp_call
  29: __PyObject_MakeTpCall
  30: __PyEval_EvalFrameDefault
  31: __PyEval_Vector
  32: __PyEval_EvalFrameDefault
  33: __PyEval_Vector
  34: __PyObject_FastCallDictTstate
  35: __PyObject_Call_Prepend
  36: _slot_tp_call
  37: __PyObject_Call
  38: __PyEval_EvalFrameDefault
  39: __PyEval_Vector
  40: __PyEval_EvalFrameDefault
  41: __PyEval_Vector
  42: __PyEval_EvalFrameDefault
  43: __PyEval_Vector
  44: __PyObject_FastCallDictTstate
  45: __PyObject_Call_Prepend
  46: _slot_tp_call
  47: __PyObject_MakeTpCall
  48: __PyEval_EvalFrameDefault
  49: __PyEval_Vector
  50: __PyEval_EvalFrameDefault
  51: __PyEval_Vector
  52: __PyObject_FastCallDictTstate
  53: __PyObject_Call_Prepend
  54: _slot_tp_call
  55: __PyObject_MakeTpCall
  56: __PyEval_EvalFrameDefault
  57: __PyEval_Vector
  58: __PyEval_EvalFrameDefault
  59: __PyEval_Vector
  60: __PyObject_FastCallDictTstate
  61: __PyObject_Call_Prepend
  62: _slot_tp_call
  63: __PyObject_MakeTpCall
  64: __PyEval_EvalFrameDefault
  65: _PyEval_EvalCode
  66: _run_mod
  67: __PyRun_SimpleFileObject
  68: __PyRun_AnyFileObject
  69: _Py_RunMain
  70: _pymain_main
  71: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

ritchie46 commented 10 months ago

Ok, this is definitely a bug. Could you send me the head of the source file in parquet? That should be the proper schema that creates this bug.

We sort on a column we don't expect in the multi-column sort.

mrocklin commented 10 months ago

Just got onto a plane. I'll be out of contact for five hours or so. I'll update here later tonight hopefully.

On Tue, Oct 31, 2023, 12:01 PM Ritchie Vink @.***> wrote:

Ok, this is definitely a bug. Could you send me the head of the source file in parquet? That should be the proper schema that creates this bug.

We sort on a column we don't expect in the multi-column sort.

— Reply to this email directly, view it on GitHub https://github.com/coiled/benchmarks/issues/1177#issuecomment-1787621855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTH4O5NVR2F7O3GQSNLYCEVGPAVCNFSM6AAAAAA6W4S3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGYZDCOBVGU . You are receiving this because you authored the thread.Message ID: @.***>

mrocklin commented 10 months ago

I'm not sure exactly what you mean by the head of the source file, but I've done the most obvious thing I can do and literally asked for the first bit with head and copied it below. If it's useful I can also point you to our generation scripts

Big binary blob

``` (test-env) benchmarks:~$ head tpch-data/scale-10/lineitem/lineitem_0330478b-ca62-40a8-87bb-640fa5d6f5c9.parquet AQ%BQ%2CQ%DEQFQ%GQ%`Q%aQbQcQ%dQeQ%fQgQ�Q%�Q%B�Q%��Q�Q%�Q%�Q%�Q%�Q%�Q%�Q%��Q%�Q�Q�Q�Q% �Q%�Q%�Q�Q�Q%B�Q%��Q�Q%�Q%�Q%�Q%�R%R%R%R%R%RR R%!R%"R%#R%$R%%R&R'R%@R%AR%BRCR%DR%ER%RFR%G`R%aR%bR%cR%BdR%efRgR�R%2�R%��R%�R%�R�R%�R�R%�R%��R�R%R��S�S�S�S�S%�S%�S%�S%�S%�S�S%�S�S�S%R0�S%�S%�S%�J�S%��S%�S�S%�S%�S%�S%�S�S%�S�S�S%T%T%TTT%R T%T% !T"T%#T%$T%R%T%&J'T%@AT%BT%CT%RDT%EFT%GT`TaTbT%cT%dT%eTfT%gT%�T%�T%�T�T%�T�T%�T%�T�T�T%�T%�T�T%R�T%�j�T%��T%�T%�T�T% $U%U&U%'U%R@U%ABU%CU%DUEU%FU%GU%`U%RaU%bcU%2dU%fUgU%b�U%��U%B�U%��U%�U%b �U%�U%��U%�U%�U%�U%�U�U%�U%�U%�U%�U%b�U%��V%JV%V%V% V!V%"V%#V%$V%V%&V'V%R@V%ABV%CV%DVEV%FV%BGV%`aVbV%cV%dV%eV%fV%gV%�V%�V�V%b �V%�V%��V%�V%�V%�V%�V%�V%�V%�V��V�V%�V�V%�V%b�V%��V%�V%�V�V%�V%�V%�V%�V%W%W%W%W%W%W%W%W% W%b !W%"W%#$W%W&W%'W%@W%AWBWCW%DW%EW%R FW%GW%`aWbWcWdW%eWf�W�W%�W%�W�W%X%X%X%X%XX%X X!X%"X%#X%$X%%X%2&X%'@XAXBXCXDX%EX%FX%2GX%`jaX%bcXdX%ReX%fgX%�X%�X�X%�X%�X%�X�X�X%�X%�X�X%!Z%"Z#Z%$Z%%Z&Z%'Z%@Z%AZ%BZCZ%DZ%EZ%FGZ`Z%aZ%bZ%cZ%dZeZ%fZ%gZ%�Z%�Z%�Z%�Z%�Z%�Z%��Z�Z%�Z%�Z%�Z%�Z�Z�Z%�Z%�Z%�Z�Z�Z��Z%�Z%�Z�Z%�Z�Z%�Z�Z%2�Z%��Z%[%R[%[%[%[%[% [%2![%"#[$[%%[&[%'[%@[%2A[%BC[%D[E[% F[%G[%`a[b[%c[d[%e[%f[%g�[%��[%�[%�[�[�[%�Z�[%��[�[%R �[%�[%��[�[%�[%\%\\\%\%\%\%\ \%!\%"\%#\%$\%\&\%'\%@\%A\%B\C\D\%E\%F\G\`\%a\%b\c\% d\%e\%fg\%�\%�\%�f]%2g]%�]�]%�]%�]%�]%�]%�]%�]%B�]%��]%�]%�]�]%�]%�]%�]%�]%�]%��]%�]%�]%bP�]%�]%�]%�]%�]%�J�]%�^^%^%^^^%^^% ^%!^%"#^$E^F^G^`^%a^%b^%c^%d^%e:f^%g�^%�^%�^%��^%�^%�^�^�^%�^%�^%R�^%��^%�^%�^%�^%�^%�j�^%�J�^%�^%�^%B�^%��^�^%�^%�^%B_%_%___%_%_% _!_"_#_%b$_%%&_'_%@_%A_%B_%RC_%DE_F_G_`_a_b_c_%B d_%e_%fg_%�_%�_�_%�_%�_%�_�_%�_%�_%�_%�_%2�_%��_%�_�_%�_%�_�_`%`%b`%`%`%B`% `%B!`%":#`%$%`%&`%'`%@`%A`%B`%C`%D`E`%F`%G```a`b`%c`d`%e`%f`%g`�`%�`%�`%�`�`%�`�`%�`%�`�`%�`%�`%�`�`�` �`%�`%�Z�`%��`%a%a%aa%a%aa% a%!a%"#a%$a%%a&a%'a@a%Aa%BaCaDaEa%FaGa%`a%aabaca%da%ea%faga%B�a%��a�a%�a%�a%�a%�a�a%��c%�c%�c% �c%�c%�c�c%2�ad%^d%d%ddd%d% d%!d"d%#d%$d%%d&d'd%@d%AdBd%Cd%Dd%Ed%FdGd%`d%adbd%cd%dd%Bed%gd%�d�d%�d�d%�d%�d�e%b�e%�e%�e%�e%��e%�e%�e%2�e%��e%�e%�ef%f%f%ff%f%f%f% f!f"f%#f%$f%f%&f%'f@f%Af%Bf%Cf%Df%bEf%Gf%`faf%bf%cfdf%2 ef%ff�f%ggggg%g%gg% g%!g%"g#g$g%%g%&:'g%Ag%Bg%CDg%Eg%Fg%2Gg%`agbg%cgdg%egfggg�g%�g%�g�g%b�g%��g%�g%�g%�g�g%�g%2�g%��g�g%b�h%�h�h%R�h%��h%�h%�h�h%B�h%��h�h�h�h%2�ai%.ii%i%i%2i% i%2!i%"#i%$i%i%&i%'i%@i%Ai% Bi%Ci%DEi%FiGi`iai%bi%ci%di%ei%fi%gi�i�i%�i%�i%�i�i%�i�i%�i�i�i%�i%�i�i%�i�i�i%�i%�i�i%�i%�i�i%�i�i%�i%�i%R�i%��i%�i%b�jj%j%bj%j%j%j% J!j%"#j%$%j%&j%`j%baj%bcjdjejfjgj%0�j%�j%�j%�j%�j%�j�j%�j%�j%�j%�j%�j%�j%�j%�j%B�j%�j%�j�j�j%�j%�j%�j%�j�j%�j%�j%�j%�ak%^k%k%k%Rk%k�k%2�k%�l%llllll%l% l%R!l%"#l%$l%l%&l%'l%@l%Al%RBl%CDlEl%FlGl`lal%bl%cldl%el%fl%gl�l�l%�l%�l%�l�l�l�l%�l%�l%�l%�l%R��`m%aZ bm%cm%em%fm%gm%R�m%��m%R�m%��m�m%�m%�m%�m�m%�m%�m%��m%�m%�m%�m%�m%�m%�m%�m%R�m%��m�m%�m%��m%�m%�m%�mn%n%n%n%n�n�n%�n%�n�n%�n%�n%bo%o%o%o%Ro%J o% o%!"o%#o%$o%o&o'o%2@o%ABoCo%BDo%EFo%RGo%ao%bo%codo%eo%fogo%�o%�o�o%�o%�o%�o�o% ��o�o%b�o%�p%p%pp%p%p%p% p%!p%"p%#p%$p%%p%&p%'p%R @p%Ap%BCpDp%Ep%2Fp%`papbp%cp%dp%ep%fgp%b�p%�p%�p�p%�p%�p%�p%�p�p%��q%q%q%q%qqq%q% q%!q"q%#q%$q%q%&q%R'q%Aq%Bq%Cq%DqEq%Fq%Gq`q%aq%bq%cq%dq%eq%fq%gq%�q�q�q%�q%R�q%��q�q%b�q%��q%�q%�q%�q�s%b�s%��s�s%�s�s%2�s%��s�stt%b0t%t%t%t%t% t!t%"t%#t%R$t%%&t't@t%AtBt%Ct%Dt%Et%FGt%`t%Bat%bct%Rdt%eft%gt%B�t%��t�t%B�t%�t�t�t%�t%��t�t%�t%�t�t�t�t%�t�t�t�t%�t�t%�t%2�a� &u'u@u%bAu%BCuDuEu%Fu%Gu%B`u%abucu%du%eufugu%�u%�u�u�u%�u%�u�u%�u�u%�u%�u%�u%�u�u%�u%�u%�u�u%�u�u%�u%�u�u�u%�u%�u%�Dw%Jw%w% w%!w"w#w%$w%%w&w%'w%B@w%ABw%Cw%Dw%Ew%FGw%R`w%abwcw%dw%bew%fgw%�w%�w�w%�w�w%�w%�w�w�w�w�w�w%�w%�w%�w%��w%�w%�z%�z%�z�z%�z�z%�z%�z%��z%B �z%�z%��z%�z�z%�z%�z�z%�z{{%{%{%{%{{% {%!{%"#{${%{%&{%'{@{%2 A{%B{%CD{%E{%F{%G{`{%a{%b{cd|%e|%2f|%g�|%�|%�|%�|�|%�|�|�|�|�|%�|�|�|%�|%�|�|%�|%�|%�|�|%�|%�|%�|%�|%�|�|%R�|%�j�|%��|�|%B}%}%}%}}%}}%R }%!}%#}$}%B%}%&'}%@A}B}%C}%D}%F}%G}%`}%a}b}c}%d}%e}f}%g}�}�}% �}%�}%��}�}%�}%�}�}%�}%�}�}%�}%�}%B�}%��}%�}%�}%B�}%��}%�}% �}%�}%��}%2�}%��}%�}%~%~%~%~~~~%~%2 ~%!"~%#~%$~%~%&~%'~@~%A~%B~%C~%D~E~%F~%G~%2`~%b~%c~d~%e~%f~%g~%�~%�~%�~�~�~�~%��~%B�~%��~%�~�~�~%�~%��~%�~%�~�~�~%�~�~�~�~%�~�~%�~�~%�~�~%R%%%%%% !%"%#%$%%%&'%0@%A%B%CD%F%RG%��%�%�%�%��%b �%�%� �%!�%"�%#�%B$�%%&�%'�%@�%A�%RB�%CD�%E�%F�%G�%`�%a�%b�c�%d�%e�%f�%g�%��%��%��%��%2��%�j��%��%��%�%��%�%� �!�%"�%#�%$%�%&�'�%@�A�%B�%D�%E�F�%G�%`�%Ra�%bc�%d�%e�f�%g��%��%��%��%��%��%b��%��%��%��%��%��%��%��%%Â%Ă%łƂǂ%�%�%B�%��%��%��%�% �%�%�%� �%B!�%"#�%$�%&�%'�@�A�%B�%b C�%D�%EF�%G�%`�a�b�%c�d�%e�f�g��%��%��%��%��%��%��%��%��%��%��%��%%Ã%ăŃ%ƃ%ǃ%�%�%B�%��%�a� �%�%�%��%R� �!�%"�%#�%$�%�&�%'�%@�%A�%B�C�%D�%F�G�%`�%a�b�%c�%d�%e�f�%bg�%�Z��%��%��%b��%��%��%��%��%��%��%�Z ��%%�Ąń%ƄǄ�%�%B�%��%��%�%��%�%�%�%�% �%!�%"�#�$�%%�%&�'�@�A�B�%C�%D�%E�%F�%RG�%a�%bb�%cd�%e�%f�%g��%��%��%��%��%b��%��%��%��%��%��%��%��%Å%ąŅ%ƅ%��%B�%��%R�%��%��B�%C�%DE�%F�%G`�%a�%b�c�d�%e�%f�g�%��%��%��%��%b��%��%��%��%��%b��%��%��%��Æ%Ć%ņ%�ǆ%�%��%�%��%�%��%�%��%�#�%$�%�&�%'�@�A�%B�%C�D�E�%F�%G�%`�%a�%b�%c�%d�%e�%f�%g�%�j��%�Z��%��%��%��%b��%��%��%B ��%��%�J��%�È%Ĉ%ňƈ%ǈ%�%�%�%�%�%�%b�%�j�%�%�%�%�%�%�% �%!�%"�%#�$�%%�%&'�%@�%A�%B�C�%D�%E�%F�%G`�%a�b�%c�%d�e�%f�%g��%��%��%��%��%b�� %!�%"�%#�%$�%�&�%R'�%A�%B�%C�%D�%E�%F�% G�%`�%ab�%c�%d�%e�%f�%g��%��%��%��%��%��%��%��%��%��%��%��%��%Ƌ�%�%��%b �%��% �%!�%"#�%$�%%�%&�'�%@�%A�%B�%C�%D�E�%bF�%G`�%a�%b�%c�d�e�%f�%g�%��%��%b��%��%��%��%��%R��%��%��%��%��%2Ì%�Ō%ƌǌ%�%��%��%�%�%b�%�%�%�%�%�� !�"�%#�%$�%�&�'�%@�%A�%B�C�%D�%E�F�%G�%`�%a�%b�%c�d�%2e�%fg�%��%��%0��%��%��%��%R��%��%B��%�j��%��%��%Í%č%R ō%ƍ%��%�J�%��%�%�%�%�%2�%:�%Z�%�% �!�"�%#�$�%�%&�%2'�%@A�%B�%C�D�%E�c�%d�e�%2f�%��%��%2��%��%��%��%��%��%��%��%R��%��%Î%ĎŎ%Ǝ%ǎ�%�%�%�Z�%��%��%�%�%�%R�%�%�% �!�%"�%#�$�%%�%&�%@�%2A�%BC�%D�%E�%F�%G�%`�a�%b�%c�%d�e�%f�%g��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%Ï%ďŏ%Ə%Ǐ�%b�%�%��%��%b �%�%��%�%�%� �%!�"�#�%$�%%�%&�%'�@�%A�%B�C�%D�E�%F�%G�%b`�%ab�c�d�%Re�%fg�%��%��%��%��%��%2��%��%��%��%��%b��%�Ð%ĐŐƐ%ǐ%�%�%�%B<�%�%�%�% �%�%��%��%��%��%b�% !�"�%#�$�%�%R &�%'�%@A�%B�%C�%D�%E�%F�%2G�%`a�b�%2 c�%d�%ef�%b�%c�%d:e�%fJg�%��%��%��%��%��%��%��%��%��%��%��%Óē%œ%Ɠ%��%�%�%B�%�a��%�%�%�%��% �%!�"�#�%$�%�%&�%'@�%A�%B�C�%D�%E�%2F�%G`�a�%b�%c�d�%B e�%f�%g��%��%��%��%��%��%��%��%��%��%��%��%%Ô%Ĕ%Ŕ%Ɣ%ǔ%��%b�%��%�%�:�%�%��%�%�% �%!�%"�%R#�%$%�%&�%'�%@�A�B�C�%D�%E�%F�%2G�%`a�%b�%c�%2 d�%e�%g�%R��%��%��%��%��%��%��%��%��%��%��%b��%�Õĕ%ŕƕ%Ǖ�%�%�%�%�%�%�%�%��%�%�%�%�� %!�"�% #�%$�%%&�'�%@�%A�B�%C�D�%E�%F�%G�%`�%a�%b�%2c�%de�f�g�%��%��G�%a�%b�c�%d�%e�f�%g��%��%��%b��%��%��%��%��%��%��%��%��%%Ø%Ę%ŘƘ%ǘ��%��%��%��%�%�%��%�% �%!�%"#�%$�%%��%��%��%��%��Ùęř%ƙǙ%�%�%�%�%�%��%��%�%��%�%�%�% �%2!�%"J#�%$J %�%&�%'@�%A�%B�%C�D�%E�%F�%G�`�%a�%2b�%cd�e�%f��0�%�%� �%!�"�%#�$�%�%&�%'�@�%A�%B�C�%2D�%EF�G�%`�%a�%b�%c�%2d�%ef�%g�%��%B ��%��%��%��%��%��%��%B��%��%��%��%��0ǝ%�%�%�J�%��%�%�%�%�%��% �!�%"�%#�%$�%%�&�%B'�%@A�B�%RC�%DE�%F�G�%`�%a�b�%2c�%e�%f�%g��%��%��%��%��%��%��%� �!�"�#�% $�%%�%&'�%@�%A�B�C�D�%E�F�%G�`�%a�%b�c�%d�%e�%f�%g�%2��%��%��%��%��%��%��%��%��%B��%��%��%%bß%�Jş%�ǟ%�%�%�%�%��%��%��%�%�%�%�%�� %!�"�#�%$�%b%�%&'�%@�%A�%B�%D�%E�F�%G�%`�%ab�%Bc�%e�%f�%g��%B��%��%��%��%��%��%��%��% ��%��%��%��%��% %bà%�Š%Ơ%��%�%�%��%b�%�:�%�%��%B�% �%!�%"�%#�%$�%%�%&�%B'�%@ZA�%BC�D�%E�F�%G�%`�%a�b�%c�%d�e�%f�g�%��%��%B��%��%��%��%��%��%��%B��%��¡%Bá%�šơ% ǡ%�%��%�%�%�%�%��%��%�%B �%�% !�%"�%#�$�%%�&�'��%¥%å%ĥ%ť%ƥǥ%�%�%��%�%�%�%�%b�%�%�%�%�%� �!�%"�%#$�%�&�%'�%@�A�%B�%D�%RE�%FG�`�%a�%b�%c�%d�e�f�%g�%��%��%��%��%�%��%��% �%b!�%"#�$�%�&�%'�@�%BA�%BC�% D�%E�%FG�`�%a�b�%c�%d�e�f�%g�%��%��%��%b��%��%��%��%��%��%b��%��%�%�%�%�%�%�%�%�%�%�%��%��% �%!�%"�%#�%$�%%�%&�%'�%@�%B�%C�%BD�%EF�%G�%`�%a�%bc�d�e�f�%g�%��%��%��% ��%��%��%��%��%�:��%��%��%��%��%��%2��%��%��%R��%�é%ĩũ%Ʃǩ%��%��%�%�%�%�%��%2�%Z�%�% �%!�%"�%#�%$�%%�&�'�%2@�%Aj B�%C�%DE�F�%G�%`�%a�b�%c�%d�e�%f�%Bg�%��%��%��%��%2 ��%��%��%��%b��%�ªê%Ī%Ū%bƪ%��%R�%��%b�%��%b�a��%��%��%B�% �%!�%"�%#�%$�%%�%2&�%'@�%B�%C�%RD�%EZ0F�%G�%`�%ab�c�d�%e�f�% g�%��%��%2��%��%��%��%2��%��%��%��%��%«%īū`�a�%b�c�%d�%ef�g��%b��%��%��%��%��%��%B��%��%��%��%B��%��%¬ìĬ%Ŭ%Ƭ%Ǭ%�%�%�%��%�%2��%��%�%�% �%!�%"�#�%$�%g�%��%��%��%b��%�J��%��%��%��%��%��%��%��%��%®%î%ĮŮ%Ʈ%Ǯ%��%�%�%�%�%R�%��%�%�%B�%�%� �%!�"�%2 #�%$�%&�%'�%@�A�B�%C�%D�E�%F�G�`�a�b�c�%d�e�f�g�%��%��%��%��%��%��%��%��%��%��%��%��%b¯%�Z į%ů%ǯ�%�%�%R�%��%�%�%�%�%�%�%�%�A�%B�%BC�%DJE�%FG�%`�%Ba�%c�%d�%e�%Bf�%g��%��%2 ��%��%��%��%��%��%R��%��%��%2±%ı%ű%ƱǱ%��%�%�%��%��%�%��%´ôĴ%BŴ%�Ǵ�%�%��%��%�%�%�%�%�%�%�%�%�% �%!�%"�#�%$�%%�%&�%'@�%A�%B�%RC�%E�%F�%G�%`�%a�%b�%c�d�%be�%fg��%��%R��%��%��%��%��%��%2��%��%��%2µ%�ĵŵƵǵ%R�%��%��%R �%�%�a�%�%�%�%�%�%R�% !�"�%#�$�%%�%&�%'�%@A�%B�C�%D�%E�%F�%2��%b��%��%¶%�ĶŶƶǶ�%�%��%��%�%��%2�%�%�%�% �%!�%"�%#�%$�%B%�%&'�%@�%A�%B�%C�D�%E�F�%G�%R`�%ab�%c�%e�f�%g�%��%��%��%�%��%2 �%��%�%�%� �!�"�#�%$�%%�&�%'�%@�A�%B�%CD�%E�%F�%G�%`�%a�%b�%c�%d�%e�f�%g�%��%R��%��%��%��%��%��%��%��%��%b��%��¸%bø%�j0Ÿ%Ƹ%Ǹ%��%�%�%�%�%b�%�%��%�%��%�%�% �%!�"�%B#�%$%�%&�'�@�A�%B�%C�%DE�%F�%G�%`�%a�b�c�%d�%e�%f�%g�%��%��%��%¹%ù%Ĺ%bŹ%ǹ�%�%��%�%�%�%��%�%�%�%��%�%� �%!�% "�%#�%$%�&�%'�%@�%A�%B�C�%D�E�%F�%G�%`�%a�%b�%c�%dje�%fg��% ��%��%��%��%��%b��%��%��%��%��%��%º%�ĺ%źƺ%Ǻ%�%�%�%�%�%�%�%�%�%�%��%�%2�%�% �%!�%"�%#�$�%%�%b&�%'J@�%B�%C�%D�%E�%BF�%G`�%ab�%c�%bd�%eZf�%��%��%��%��%��%��%b ��%��%��%R��%�Z��%��%��%»%û%RĻ%�ƻ%ǻ%�%�%�%b�%��%�%�%�%��%�%�� %!a�b�%Bc�%de�f�%g��%��%��%��%��%��%B��%��%��%��%��%2��%�ü%ļ%ż%ƼǼ%�%�%�%�%�%2�%��%��%�%�%��%�% �% �%!"�%#$�%%�%B&�%':@�%B�%C�%D�%E�%F�%G�`�%a�%b�c�%d�%e�%bf�%g��%��%B��%��%��%��%��%��%��%½%ý%Ľ%Ž%ƽ%ǽ%��%�%�%�� %!�%"�%#�%$�%�&�%'�%@�A�B�%C�%D�E�%F�%G�%`�%a�%b�%Bc�%de�%f�%g�%��%��%R��%��%B��%��%��%��%��%��%��¿%�%�%&'�%@�%A�B�C�D�E�F�%b G�%`�%ab�%c�d�%e�%f�%g�%��%��%��%��%��%R��%�:��%�:��%�j ��%��%�J��%��%��%��%��%��%��%��%�%�%�% �%R !�%"�%#$�%%�&�'�@�A�B�%C�D�%E�%F�G�%`�%a�%b�c�%d�%ef�%g�%��%��%��%��%��%��%��%R��E�%F�%G�%`�%a�%b�%c�d�e�%f�%g�%��%��%��%��%��%��%��%��%��%��%��%��%��%b��%��%��%��%��%��%�$�%2%�%&'�%2@�%B�%C�D�%E�%F�G�%`�%a�%bc�%d�e�%f�g�%��%B��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%�Fg�%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%B ��%��%��%�%B�%�%��% �%2 !�%"�%#$�%�%&�%'�%2 @�%A�%BC�D�%E�%F�%G�%`�%a�%b�c�%d�e�%f�g�%��%��%��%��%b��%��%��%��%��%��%2��%��%��%��%��%��%��%��%R��%��%��%��%��%��%��%��%�%�%�%�%:�% !�%"�%#�%$�%%�%&�%B'�%@A�B�C�%D�%E�%F�%G`�a�b�%c�%d�%e�f�%g�%��%��%��%�� %!�%"�#�$�%%�%&�%'�@�%BA�%BC�%D�%E�%F�G�%`�%ba�%bc�d�%e�%f�%g��%��%��%��%��%��%��%��%��%��%��%B��%��%B��%��%��%��%��%��%��%��%��%�%B�%�%b�%�%� �%!�%"�%#:$�%%&�%'�@�%A�%B�C�%D�%E�%bF�%GJ`�%ab�%c�d�%B0e�%f�%g�%��%��%��%��%��%R��%��%��%��%��%��%��%��%��%��%��%b��%��%��%��%��%2�%�%�% �%!�%"�%#�$�%�&�'�%@�%A�%B�%C�%D�E��%��%��%�%�%�%�%�%�%�%�%2 �%!J"�%#$�%%�&�%'�%B@�%AB�%C�%D�%2E�%FG�`�%a�b�c�%d�%e�%f�% g�%��%��%��%��%B��%��%B��%��%2��%��%R��%��%��%�Z��%��%��%��%��%��%�a��%��%��%�%�% �!�%R"�%#$�%%�%&�%'�@�%A�%B�%C�%D�%E�%F�%G�%`$�%�&�'�%@�%A�%B�%C�D�%b0E�%F�%G�%a�%b�%c�%2 d�%e�%fg��%��%��%��%��%��%��%��%��%��%��%b��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%�%�%��%�%�%�% �%!�%"�%#�%$�%2%�%&'�%@�A�%B�%C�%D�%F�%G`�%ba�%bc�d�%e�%fg�%2��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%b��%��%��%��%��%��%��%��%��%��%��%�%J��%�%b �%!"�%#�%b$�%%&�%'�%b@�%A:B�%CD�E�%F�%2G�%`a�%b�c�%d�%e�f�g�%��%��%��%��%��%R��%�: ��%��%��%��%��%��"�%#�%$�%B%�%&'�%@�%AB�%C�%D�%E�F�%G�%`�%a�%b�%c�d�%e�f�%g�%R��%��%��%��%��%��%��%B ��%��%��%��%��%��%��%��%2��%��%��%2��%��%��%��%�%�%�%�%�%��%�%!�%"�#�%$�%%�&�'�%R@�%AB�%C�%D�%E�%F�G�`�a�%b�%c�%d�%e�%f�g��%��%��%��%��%��%��%��%��%��%��%b0��%��%��%��%��%��%��%��%��%��%��%�%��%��%B�%�% �%!�%"#�$�%%�&�'�@�%A�%B�C�%D�%E�F�G�`�%a�%b�c�%d�%e�f�%g�%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%B ��%��%��%��%��%B��%�:��%�J��%��%��%��%��%0��%��%��%��%��%��%��%��%��%��%��%b�a�%^��%B��%�% �!�%"�#�%$�%%�&�%'�%@��%��%R ��%��%�:��%��%�%��%�%�% �%!�%"�%#�$�%%�&�%'�%A�B�C�%D�E�%F�%G�%R`�%ab�%c�%d�e�%B f�%g�%��%��%��%R��%��%��%��%��%2��%�Z��%��%��% ��%��%��%B��%��%R��%�: ��%��%�%��%R�%�% �%!�%"�%#�%$�%%�&�%'�%2@�%AB�%C�%D�%E�%F�G�%`�%a�b�%c�%d�%e�%f�%g�%��%B��%��%��%��%��%��%��%��%��%��%B��%�Z��%�:��%��%��%��%��%��%�%��%b g�%��%��%b��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%�%��%�%�%R0 �%!�%"�%#$�%%�%��%��%��%��%��%��%��%�%��%��%�%�% �%!�%"�%#�%$�%2%�%&'�%@�%AB�%C�D�%E�%F�%G�%`�%a�%b�%c�d�%e�f�%g�%��%��%��%��%��%��%��%��%��%��%��%�%��%��%�% �!�"�#�$�%�%&�%'�@�A�%B�%C�D�%E�%F�%G�%`�a�%b�%c�%d�%e�f�%g�%B��%��%R��%��%��%��%��%��%��%��%��%�%��%�%�% �%�% �!�%"�%#�$�%%�%&�%'�@�%A�% B�%C�%DE�%F�%G�`�a�b�%c�%d�e�%Bf�%g��%��%��%��%��%��%��%��%��%��%R��%��%��%��%b0��%��%��%��%b ��%��%��%��%�%��%�%��% �%!�%b"�%#$�%%�%&�% '�%@�%AB�%C�D��%��%��%��%�%�%�%��%�% �%�% J!�%"#�%$�%�%&�%'�%@�A�%B�%C�%D�%E�% F�%G�%`a�%bc�d�%e�%f�%g��%��%��%��%��%��% ��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%2��%��%�%�%�%2�% j!�%"#�%$�%%�%&'�%@�A�B�C�%D�%E�%RF�%G`�a��%��%��%��%b��%�j��%�J��%��%��%��%�%2�%��%�% �!�%"�%#�%$�%%�%&�%'�@�%bA�%BC�D�%E�%F�%G�%`�%a�%b�c�%bd�%ef�g�%��%b��%�J��%��%��%��%��%��%��%B ��%��%��%��%�a�<�%�%�%�%Z�%Z�%!�%"�%#�%$�%%�%&�%'�%@�%A�%B�%C�D�%E�%F�%G�� %�%�%�%2��%b �%!"�%#�%$�%%�%&�'�@�%A�B�%C�%D�%E�F�%G�`�%a�b�%c�d�e�f�%g��%��%��%��%��%��%��%��%��%R��%��%��%��% �%!�%"�%#�%$�%�%&�%'�%@�%A�%B�C�%D�E�%F�%G�`�%a�%c�%d�%e�f�%g�%�J��%��%��%��%��%��%2 ��%��%��%��%��% ��%g��%��%��%��%��%��%�%�%b�%Z��% �!�%"�#�$�%%�%&�%'�%@�A�B�C�D�E�%F�G�%`�%2a�%bc�d�e�%f�%g�%��%��%��%��%��%��%2 ��%��%b��%��%��%��%��%��%��%��%��%�%�%2 �%!"�%#�%$�%%�%R&�%'@�%A�B�C�D�%E�F�G�`�%a�b�%c�%d�%e�f�%g�%��%��%��%��%2��%��%2��%��%��%��%�%�%�% �%�% �%!�"�%#�%$�%%�%&�%'@�%A�B�C�D�%E�%F�%G�%`�%a�%b�c�%d�e�f�%g�%��%��%��%��%��%��%��%R��%��%�Z ��%��%��%��%��%��%2��%��%��%��%��%��%��%��%��%�%�%��% �%!�%b"�%$�%%�%&�%'�%@�A�B�%C�%D�E�%F�%G�%B0`�%a�%b�%cd�%e�%f�g�%��%��%2��%��%��%R��%��%��%��%��%��%��%��%��%��%��%B��%��%�� %!�%"�%#�%$�%R %�%&�%'@�%A�%B�%C�%D�E�F�G�%`�%a�%2b�%d�%e�f�g�%��%��%��%��%��%��%��%��%��%��%��%��%��%��%2��%��%��%��%��%��%b��%�a�%��%��%�%R �%�% !�%"�%b#�%%�%&�'�%@�%BA�%C�%D�%E�%F�%2G�%`a�b�%R c�%d�%f�g�%��%��%��%��%B��%��%��%��%��%��%��%��%��%��%��%��%��%R��%��%��%R��%�%�%�%�%� �!�%"�%#�$�%�%&�%'�@�%A�%B�%C�D�E�F�%G�%a�%b�%c�%d�%e�%f�%g�%��%��%��%��%��%2��%��%��%��%��%�J��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%2��%�j��%��%��%��%��%��%B��%��%�%��%�%�% �!�%"�#�%$�%%�&�'�%@�%BA�%BC�%2D�%F�%G�%`�%a�%b��%��%2��%��%��%��%��%��%��%��%��%��%��%��%��%2��%��%��% ��%��%��%��%b�%j�%:�%:�% !�%"�%#�%$�%%�&�'�%@�%A�%B�C�%D�%E�%F�%@G�%`�%a�%b�%cd�e�%f�g�%��%��%��%��%��%��%��%��%��%��%��%��%��%��%��%b��%��%��%��%��%��%��%��%�%�%��%��%�%�%b0 �%!�%"�%#$�%�%&�'�%@�A�%B�C�D�E�%F�%G�%`�%a�%bjc�%de�%f�%g�%B��%��%��%��%2��%��%�%�%�%�% �!�%"�%b #�%$�%%&�'�%@�%A�B�%C�%D�E�%F�%G�`�%a�%bc�%Rd�%ef�%Bg�%�J��%��%��%b��%��%2 ��%��%��%��%R��%��%��%��%��%��%��%��%��%��&&&& &!&"#&$&%&&&'&@&A&B&C&DE&F&G`&a&bb&cd&e&f&g&��&�&�&�&�&b�&��&��&�&��&��&&&&b& !&"&#$&%&&'&@&A&B&C&DE&bF&G`abc&d&ef&g&�&�&2�&�&2 �&�&��&2 �&�&��&��&�&�&�&�&�&��&�&�&��&��&&b &&&& &!&"#�&�&2�&��&�&�&��&�&�&�&�&�&�&��&�&��&��&&&&& !&"&#&$&%&&'&@&ABCD&BE&FG`&abc&def&g�&2�&��&�&�&�&�&��&�&�&��&�&�&�&�&�&B �&�&��&��&�&�&�&b&"&&&@&&&2&F&&&&&&B&�&&&R&�&&2&�&&&&&&&&&&B&�&& & &b &b &B &2 &B & &! &"# &$% && &' &@ A &B &BC &DE &F &G ` &20a&b &c &dZe &fg � &� � &� � &� &� &R� &�� &� &� &� � &� &� &� &� � &� � &� � &� &� &b� &�j�&�� &B� &�Z &R & & &2 &b & & &2 &!" ```

milesgranger commented 10 months ago

I was poking at this myself for a bit, and I've found the following.

If using polars.scan_pyarrow_dataset instead to read the dataframes then it works fine. scan_parquet is what gives the error.
I can read each individual folder with pl.scan_parquet('./tpch-data/scale-1/{customer, lineitem, orders}').collect(streaming=True) without error, it seems it's happening somewhere in the query.
Going further, the error happens specifically when trying to sort on `o_orderdate': https://github.com/coiled/benchmarks/blob/00d9104dd5ad5e785357034aeb76ebe704a91216/tests/tpch/test_polars.py#L146
Note this happens on our --relaxed-schema (convert Decimal -> double, and Date -> timestamp_s) dataset. Using the strict dataset schema, w/o those conversions will give a different error (I suspect is related to the Decimal field):

polars.exceptions.ComputeError: not implemented: reading parquet type Int64 to Float64 still not implemented

And then again, if using pl.scan_pyarrow_dataset it'll work for the strict schema as well.

Hope this is helpful.

Here's a file from the orders dataset using the relaxed schema which causes the original error in the query: orders_1de67c09-e8a4-4278-8a94-fac06d18a5ee.zip

and a query using just that file which will reproduce:

pl.scan_parquet('orders_1de67c09-e8a4-4278-8a94-fac06d18a5ee.parquet').sort(by=['o_totalprice','o_orderdate'], descending=[True, False]).collect(streaming=True)

(note, sorting on only one of those columns will also not produce the error)

coiled / benchmarks

Polars queries 3 and 7 don't run in streaming mode #1177