Open sidneymau opened 1 month ago
Note that the implementation proposed in the above PR ends up being fairly inefficient because it can't fully leverage nodes for, e.g., projections and filtering. If interested, this functionality could be included—basically providing a dataframe-like interface to constructing an Acero plan as can be done with DataFusion—but that is a bit larger in scope
Describe the enhancement requested
Presently,
Dataset
has methods to perform several actions—sort_by
,join
, andjoin_asof
—with Acero. It would be especially helpful to provide a method to perform aggregations on datasets using Acero for convenient out of core processing.The implementation can be modeled off of the existing
Dataset
Acero operations as well as theaggregate
method ofTableGroupBy
.Component(s)
Python