BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

collection_set feature request #1516

Open MikeChenfu opened 3 years ago

MikeChenfu commented 3 years ago

Is your feature request related to a problem? Please describe. Hello guys, I have a SQL sample running on the Blzaingsql. I see the collection is not supported in 0.20 but the cudf has done that.

Describe the solution you'd like

Here is my sample code.

q = '''
    SELECT 
        id, collect_set(date)
    FROM
        table
    GROUP BY
        id
    '''

Describe alternatives you've considered Here is a cudf solution.

q = table.groupby(['id'], as_index = False).agg({'date' : collect})
q.date = q.date.list.unique()
wmalpica commented 3 years ago

Hello @MikeChenfu there are several functions that are available in cudf that we would love to implement in BlazingSQL such as collection_set and lateral view explode. The problem is that those functions are not part of standard SQL, which means they are not understood by Apache Calcite as we are using it right now. We use Apache Calcite to parse the SQL queries and provide us with an optimized logical relational algebra plan. We are currently looking into how we can leverage or modify Apache Calcite to allow us to implement functions that are found in Hive or Spark but not in standard SQL. I dont expect this to be a quick project, but it is something we want to do and are currently looking into alternatives that would allow us to support these sort of functions.

MikeChenfu commented 3 years ago

Thanks @williamBlazing for the detailed explanation. Glad to hear you have a plan to do that.

beckernick commented 3 years ago

Perhaps it might make sense to leave this issue open as a feature request?