WillAyd / pandas_rust_algos

Implementation of some Cythonized pandas routines in Rust
4 stars 0 forks source link

PERF: Fuse multiple aggregations #20

Open jbrockmendel opened 1 year ago

jbrockmendel commented 1 year ago

Often times groupby reductions are called several at a time via .agg, e.g. gb.agg(['mean', 'var', 'max']). ATM we compute these separately, which entails iterating over the data 3 times. Fusing these into a single operation could save extra passes over the data. I suspect those passes add up.

WillAyd commented 1 year ago

Oh nice idea. Is the limitation with this in Cython that you'd have to return a list of results, and that doesn't play nicely with cdef?

In theory here I think could use the same zip pattern we have for group_sum in quite a few places:

https://github.com/WillAyd/pandas_rust_algos/blob/b57460ce49b731c978745a966b331669610baa02/src/groupby.rs#L871

Then return a Vec of results for each aggregation passed in

jbrockmendel commented 1 year ago

Is the limitation with this in Cython that you'd have to return a list of results, and that doesn't play nicely with cdef?

I expect we could return an ndarray (or a custom cdef object) instead of a list, so that wouldn't be an issue.