confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
116 stars 1.04k forks source link

Allow user-specified retention criteria for tables #3519

Open derekjn opened 5 years ago

derekjn commented 5 years ago

When creating a table, users should be able to specify the retention criteria for the underlying topic.

apurvam commented 5 years ago

possibly related : #2346

big-andy-coates commented 5 years ago

Hummm.... the topic's behind tables as compacted and should have infinite retention IMHO.

On the flip side, this is a super useful feature for created streams...

derekjn commented 5 years ago

@big-andy-coates this would essentially allow users to set a TTL on their tables' rows, which I feel is a fairly common pattern. If a user doesn't care about any table rows that haven't been updated within the retention period, their only option currently is to just accumulate garbage indefinitely.

Would exposing retention criteria create any gotchas, or otherwise have any negative effect on tables that continue using the default infinite retention? What I'm trying to understand is if there is any downside to allowing this...

mjsax commented 3 years ago

Personally, I have my doubts about this feature request. Having a TTL is a quire "weird" feature for a database...

colebaileygit commented 3 years ago

Just my 2 cents, but I am encountering this because as the "DB" grows over time, node failover becomes increasingly heavy to restore all the old keys, even though we are no longer interested in them. This is especially relevant for time-windowed data where the assumption is we can safely expire old time windows and replace them with the latest ones, but I can see it also applying to transactional data which will grow indefinitely and eventually be too costly to maintain as temporary storage and replaying kafka changelogs every time a pod restarts or scales up / down.

It does depend on the use-case, but it would be handy to have the option to create tables with TTL if the user is comfortable with the trade-offs involved and knows their query patterns can support it.