confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
41 stars 1.04k forks source link

Pull Queries: Tunable retention for windowed aggregation retention #4157

Closed vinothchandar closed 4 years ago

vinothchandar commented 4 years ago

Is your feature request related to a problem? Please describe. Currently, the underlying streams state store for windowed aggregation expires after 24 hrs (TBD) and this means, ksqlDB is unable to serve data computed for older time windows and limits the pull queries to just recent data.

Describe the solution you'd like Provide a tunable knob, for the user to control this per table

Describe alternatives you've considered

Additional context

big-andy-coates commented 4 years ago

This is actually something useful outside of just pull queries. There is already a bug where things go wonky if the window size exceeds the 24hours.

KLIP-10 proposes to control this retention time. However, it couples it to the amount of time the window is deemed open , i.e.g can be updated with out-of-order data. We may actually want to decouple how long a window remains open from its eviction. So that we can keep closed windows around for pull queries. However, this is likely an enhancement to KS. Initially, I'd fix this using KLIP-10 and potentially raise a secondary issue to decouple the two durations.

It's also worth noting that it is currently possible to tune the retention time by SETting the appropriate streams config before executing a query. KLIP-10 moves this out of secret configs and into the query definition.

big-andy-coates commented 4 years ago

Removed milesstone 0.7.0 as its highly unlikely KLIP-10 will be done by then.

apurvam commented 4 years ago

I actually don’t think this should be coupled with KLIP-10. From a user point of view, How long to retain windowed stores and hence have them available for querying is independent of how tolerant of late arriving data we are.

For instance, dash boarding applications just want to be able to query old windows, and we should make this tunable independently of other concerns IMO. Multiple people have already hit this limitation since launch.

Whether this should be in 0.7.0 depends on whether it can make the cut in light of the of the other pull query p0’s. @vinothchandar can make that call.

apurvam commented 4 years ago

I think the real trade offs for 0.7.0 here are between this and the current work for HA / good baseline perf for pull queries.

I still think the latter are more important to nail early, and querying older stores can be an incremental improvement. Cc @MichaelDrogalis @derekjn

In general, we should be sequencing work with a view toward delivering needle moving incremental value to users, rather than looking at sequencing work purely from the point of view of implementation elegance and efficiency.

Also, I think the weekly team meeting where we go over the release items is the right place to discuss trade offs about what should and should not go into a release.

apurvam commented 4 years ago

related: https://github.com/confluentinc/ksql/issues/899

vinothchandar commented 4 years ago

This is not currently a P0 for 0.7.0, HA stuff is prioritized higher. Whether we can target this, depends on the actual fix.. Lets keep this as good-to-have on 0.7.0 and see how it goes.

vinothchandar commented 4 years ago

@big-andy-coates FYI, I am working on this currently, targeting 0.8.0. I like RETAIN FOR syntax that you mentioned on #899 . Streams has since deprecated Windows.until in favor of Materialized.withRetention.

Can we close #899 in favor of this. (or I can use that. dont mind either way)

mjsax commented 4 years ago

So that we can keep closed windows around for pull queries. However, this is likely an enhancement to KS.

Just to clarify. In KS, there is a window grace period that defines how long a window is open an accepts out-of-order data -- all records after the grace period are considered late and are dropped (this is recorded by a metric and I think also logged at WARN level).

In addition to the grace period, there is a store retention time that allows to keep data longer for querying.

vinothchandar commented 4 years ago

I am actively working on this in #4733 and I have added support for retention and grace period.. PR is mostly there.. Take a skim if you have cycles.. :)