daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

`group()` built-in function in DaphneDSL #903

Open pdamme opened 1 week ago

pdamme commented 1 week ago

DaphneIR has a GroupOp for relational-style grouping and aggregation, plus a corresponding group-kernel. However, at the moment, this operation can only be created through DAPHNE's SQL parser, but not through DaphneDSL.

This task is to add a group() built-in function to DaphneDSL that creates a GroupOp in DaphneIR. The interface should be (following the notation used in the docs: group(arg:frame, groupCols:str, ..., sumCol:str). That is, the built-in function gets a frame, an arbitrary number of columns to group on, and a single column to calculate the sum on. While this interface does not allow to use all features of the GroupOp/group-kernel (e.g., multiple aggregates, other aggregate functions than sum), it would be a good first step and sufficient for implementing the Star Schema Benchmark in DaphneDSL. We can reflect the full functionality of the GroupOp is DaphneDSL later.

Hints:

saminbassiri commented 1 week ago

Hi, I will work on this issue.

pdamme commented 1 week ago

Great, please go ahead!