apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.37k stars 3.49k forks source link

`table.group_by([]).aggregate` raises an error on some aggregate functions. #36149

Open coady opened 1 year ago

coady commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

Grouping on empty keys aggregates against the whole table (relates to #14896). But there are 3 hash aggregate functions which do not have corresponding scalar aggregate functions: hash_distinct, hash_list, and hash_one. Grouping on empty keys with those raises a key error.

In []: table = pa.table({'key': list('aba'), 'value': [0, 1, 2]})

In []: table.group_by(['key']).aggregate([('value', 'list')])
Out[]: 
pyarrow.Table
key: string
value_list: list<item: int64>
  child 0, item: int64
----
key: [["a","b"]]
value_list: [[[0,2],[1]]]

In []: table.group_by([]).aggregate([('value', 'list')])
...
ArrowKeyError: No function registered with name: list

In []: table.group_by([]).aggregate([('value', 'min')])
Out[]: 
pyarrow.Table
value_min: int64
----
value_min: [[0]]

Component(s)

C++, Python

vibhatha commented 1 year ago

take