Open vpalmisano opened 4 years ago
Any updates about this feature ?
As discussed on Slack, this feature would be awesome indeed.
The case is simple. There are thousands of messages per second for some aggregation group in Kafka Streams, and there’s Kafka Connect sink that sinks data from result topic to e.g. MySQL (upserts). But we don’t really need those updates to be done so frequently - 1 update in let’s say 10 minutes, or 1 update in 3 hours is good enough (suppressed, with last state), and not thousands of updates per second. The aggregation is done in e.g. 1 full day time window (using WINDOW
operator, or just manually - GROUP BY date
), but I need updates inside the group every X minutes (or hours). And in this case, the stream time (not the wall-clock time) is perfectly OK, since there are always lots of messages produced into the input topic.
Having a bigger time window (e.g. full day) brings some problems if we EMIT FINAL
only when window closes, since we’ll have messages bursts each day at 00:00:01 for all the groups we have - millions of messages at the same time.
And to my mind it’s a super-simple case where suppress until time limit shines, and would be awesome to have this functionality in KSQL.
Attaching a simple problem visualisation.
It will be useful to add an option to the EMIT FINAL statement (or using other options as well) in order to use the
untilTimeLimit
instead ofuntilWindowCloses
(the default value now, see #1030). This will allow the rate limiting of aggregation queries (es. calculating the average values over a time period).