confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
44 stars 1.04k forks source link

Introduce support for sliding window aggregations #6873

Open derekjn opened 3 years ago

derekjn commented 3 years ago

Support for sliding window aggregations has been added to Kafka Streams (KIP-450), so we should introduce support for them in the ksqlDB syntax:

SELECT x, count(*) FROM s WINDOW SLIDING (SIZE 1 SECOND) GROUP BY x EMIT CHANGES;
anupsdtiwari commented 3 years ago

@vcrfxia - By when can we expect this feature in ksqldb ? Actually this is very common requirements and since ksqlDB doesn't support this, we have to look for some other query engine.

vcrfxia commented 3 years ago

Hi @anupsdtiwari , this feature is among the items being considered for our roadmap but we don't have a specific release date to share yet. BTW, are you sure that your use case requires sliding windows rather than hopping windows? (There is some inconsistency in terminology within the stream processing field here.) Hopping windows are indeed a very common use case, but we have not seen as much demand for sliding windows. If you're willing to share your use case for sliding windows that'd be great as another data point. Thanks.

anupsdtiwari commented 3 years ago

Hi @vcrfxia - Actually sliding window seems more common at our end.

My use case is as follows :-

At any point of time last N hours/days/minutes/months of aggregated data is needed. Eg :- At any point of time, what was the number of clicks done by the user on a button in last 24 hours ? Similarly number of transactions / total amount added to user's wallet in last 2 days etc.

Also you can refer below example :- Lets say at current_time 2021-07-09 10:00:01 AM for a user id 201, the number of clicks in the last 48 hours(2021-07-07 10:00:02 AM) is 50. But if i hit pull query again for this user at current_time 2021-07-09 13:00:01 PM then count should be from 2021-07-07 13:00:01 PM to current_time which might be different from 50.

kaydraka commented 2 years ago

Hi, I have the same requirements to @anupsdtiwari . Alternative way to use Hopping window makes too many messages for me.

anupsdtiwari commented 2 years ago

@vcrfxia @mjsax @derekjn Do we have any updates on this ticket ?

johnnycashash commented 2 years ago

Hi, There is one requirement where it is needed to trigger different resultant events as soon as a certain type of input event count crosses multiple thresholds.

As soon as it crosses first threshold, we trigger event 1 and so on for upcoming thresholds...

We can utilize hopping window in this use case by doing a workaround of removing duplicates, but it requires some more streams to be created and processed.

By sliding window, this will be a much simpler query to emit those event. ex: SELECT x, type, count() FROM s WINDOW SLIDING (SIZE 1 HOUR) where type="A" GROUP BY x having count()=10 EMIT CHANGES;

SELECT x, type, count() FROM s WINDOW SLIDING (SIZE 1 HOUR) where type="A" GROUP BY x having count()=20 EMIT CHANGES;

anupsdtiwari commented 2 years ago

@vcrfxia @mjsax @derekjn

Guys - Just checking, do we have any updates on this ticket ?

cefothe commented 1 year ago

Hey folks.

one more use case in the sliding window will be handy: I want to calculate the crypto volume for the last 24 hours.

My data looks like this: { "pair": "BTC/USDT", "price": 16603.04, "amount": 0.01014, "side": "sell" }

Currently, I am trying something like: select pair ,sum(amount) as volume, WINDOWSTART AS window_start, WINDOWEND AS window_end from TRADES_STREAM WINDOW HOPPING (SIZE 24 hours, ADVANCE BY 1 seconds, GRACE PERIOD 0 MINUTES) GROUP BY pair EMIT CHANGES;

But as you can see, this hopping window will create many small windows, but I want only one moving window that always gives me the result for the last 24 hours by moving the window every 1 second.