confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
128 stars 1.04k forks source link

Allow using untilTimeLimit in EMIT FINAL statement for windowed aggregations #6540

Open vpalmisano opened 4 years ago

vpalmisano commented 4 years ago

It will be useful to add an option to the EMIT FINAL statement (or using other options as well) in order to use the untilTimeLimit instead of untilWindowCloses (the default value now, see #1030). This will allow the rate limiting of aggregation queries (es. calculating the average values over a time period).

yinheli commented 3 years ago

Any updates about this feature ?

vitalikaz commented 3 years ago

As discussed on Slack, this feature would be awesome indeed.

The case is simple. There are thousands of messages per second for some aggregation group in Kafka Streams, and there’s Kafka Connect sink that sinks data from result topic to e.g. MySQL (upserts). But we don’t really need those updates to be done so frequently - 1 update in let’s say 10 minutes, or 1 update in 3 hours is good enough (suppressed, with last state), and not thousands of updates per second. The aggregation is done in e.g. 1 full day time window (using WINDOW operator, or just manually - GROUP BY date), but I need updates inside the group every X minutes (or hours). And in this case, the stream time (not the wall-clock time) is perfectly OK, since there are always lots of messages produced into the input topic.

Having a bigger time window (e.g. full day) brings some problems if we EMIT FINAL only when window closes, since we’ll have messages bursts each day at 00:00:01 for all the groups we have - millions of messages at the same time.

And to my mind it’s a super-simple case where suppress until time limit shines, and would be awesome to have this functionality in KSQL.

Attaching a simple problem visualisation.

ksql-suppress-image