dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask
https://dask-sql.readthedocs.io/
MIT License
376 stars 71 forks source link

Pre-compute window function operands to simplify Dask graph #1331

Open charlesbluca opened 1 month ago

charlesbluca commented 1 month ago

Though the hope is that https://github.com/dask/dask-expr/pull/1059 should unblock (at least some of) the hanging tests, observation of the graphs getting produced from our window code shows that we should be able to simplify things pretty significantly by extracting all the operand columns at once from the base dataframe (which in practice should not be getting modified in any meaningful way by the following groupby-apply operations).

Haven't un-skipped any of the tests because things are still hanging, though now this seems to be getting caused by fix_dtype_to_row_type - will explore this function to see if there's any patterns we could simplify there.