Open samster25 opened 8 months ago
@nsalerni Can you let me know if I missed anything?
@samster25 This covers a good chunk of the use case. Two others I can think of:
The one I can see missing from the above list would be apply()
(i.e. being able to take some form of custom logic to a list column). It seems like that's covered by https://github.com/Eventual-Inc/Daft/issues/1976?
I'm not sure if the above would implicitly allow us to support the following, but this would be another simplified example of a use case I'd like to support:
df = daft.from_pydict({
"strings": ["a", "b", "c", "d"],
"lists": [[1, 1, 1, 1], [1, 1, 1, 1], [2, 2, 2], [2, 2, 2]],
})
df.groupby('lists').agg([
(col("lists").alias("list_count"), 'count')
]).collect()
I'd imagine the output of this looking something like:
lists (Int64) | list_count (UInt64)
------------- | -----------------
[2, 2, 2] | 2
[1, 1, 1, 1] | 2
Today this yields:
PanicException: List(Int64) not implemented
Hi @nsalerni ! I just made a new issue to track the work on grouping by list columns: https://github.com/Eventual-Inc/Daft/issues/1983
We should support the following aggregations on the list type name space