Open jbrockmendel opened 1 year ago
Oh nice idea. Is the limitation with this in Cython that you'd have to return a list of results, and that doesn't play nicely with cdef?
In theory here I think could use the same zip pattern we have for group_sum
in quite a few places:
Then return a Vec
of results for each aggregation passed in
Is the limitation with this in Cython that you'd have to return a list of results, and that doesn't play nicely with cdef?
I expect we could return an ndarray (or a custom cdef object) instead of a list, so that wouldn't be an issue.
Often times groupby reductions are called several at a time via .agg, e.g.
gb.agg(['mean', 'var', 'max'])
. ATM we compute these separately, which entails iterating over the data 3 times. Fusing these into a single operation could save extra passes over the data. I suspect those passes add up.