confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
41 stars 1.04k forks source link

Support table function in aggregation #4030

Open rmoff opened 4 years ago

rmoff commented 4 years ago
ksql> select explode(hashtagentities)->text AS HASHTAG, COUNT(*) from tweets_03 where retweet=false and hashtagentities is not null GROUP BY explode(hashtagentities)->text emit changes limit 20;
Table functions cannot be used with aggregations.

Workaround is to materialise the table function's results first, and then aggregate:

ksql> CREATE STREAM HASHTAGS AS select explode(hashtagentities)->text AS HASHTAG FROM TWEETS_03;

 Message
---------------------------------------------------------------------------------------
 Stream HASHTAGS created and running. Created by query with query ID: CSAS_HASHTAGS_18
---------------------------------------------------------------------------------------
ksql> SELECT HASHTAG, COUNT(*) FROM HASHTAGS GROUP BY HASHTAGS EMIT CHANGES LIMIT 10;
Column 'HASHTAGS.HASHTAGS' cannot be resolved.
ksql> SELECT HASHTAG, COUNT(*) FROM HASHTAGS GROUP BY HASHTAG EMIT CHANGES LIMIT 10;
+-------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
|HASHTAG                                                                                                                                    |KSQL_COL_1                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
|engagement                                                                                                                                 |1                                                                                                                                          |
|love                                                                                                                                       |1                                                                                                                                          |
|observationwheel                                                                                                                           |1                                                                                                                                          |
|insitro                                                                                                                                    |1                                                                                                                                          |
|healthcare                                                                                                                                 |1                                                                                                                                          |
|dev                                                                                                                                        |1                                                                                                                                          |
|hackathonforgood                                                                                                                           |1                                                                                                                                          |
|DataGuard                                                                                                                                  |2                                                                                                                                          |
|GCDigital                                                                                                                                  |1                                                                                                                                          |
|AirPodsPro                                                                                                                                 |1                                                                                                                                          |
Limit Reached
Query terminated
ksql>
hedayat commented 2 years ago

I'm also interested in this feature; and I think the workaround is not good since a new stream from EXPLODE() can have a considerable overhead AFAIK. If I want to create an stream for each EXPLODE(), it'll mean many streams for different explodes which is not good (I want to EXPLODE() output over an array which is itself member of a map with many members). I want to materialize the aggregation result, not all data from EXPLODE() functions.

It's so important that I'll probably need to look for an alternative to ksqldb.