dask / dask-expr

BSD 3-Clause "New" or "Revised" License
86 stars 27 forks source link

Dask Expressions

Dask DataFrames with query optimization.

This is a rewrite of Dask DataFrame that includes query optimization and generally improved organization.

More in our blog posts:

Example

import dask_expr as dx

df = dx.datasets.timeseries()
df.head()

df.groupby("name").x.mean().compute()

Query Representation

Dask-expr encodes user code in an expression tree:

>>> df.x.mean().pprint()

Mean:
  Projection: columns='x'
    Timeseries: seed=1896674884

This expression tree will be optimized and modified before execution:

>>> df.x.mean().optimize().pprint()

Div:
  Sum:
    Fused(375f9):
    | Projection: columns='x'
    |   Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
  Count:
    Fused(375f9):
    | Projection: columns='x'
    |   Timeseries: dtypes={'x': <class 'float'>} seed=1896674884

Stability

This is the default backend for dask.DataFrame since version 2024.3.0.

API Coverage

Dask-Expr covers almost everything of the Dask DataFrame API. The only missing features are: