apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.17k stars 1.21k forks source link

Row level TTLs in Pinot #13061

Open rohityadav1993 opened 2 weeks ago

rohityadav1993 commented 2 weeks ago

The requirement arises from usecases that need to filter out rows during query that are older than a provided timestamp representing TTL. This can be achieved currently by caller side implementations:

  1. Create a row_ttl column (time) where the value is provided by upstream.
  2. For every query, decorate it with the additional filter clause e.g.: select count(*) where row_ttl > current_timestamp

The above can be provided as a native feature in Pinot for ease of adopting.

The SLA for filtering out TTLed rows is strict hence any minion based approach can not be applied.

Jackie-Jiang commented 2 weeks ago

This requirement seems a little bit strange. Let's say at the ingestion time the timestamp is within TTL, but before the segment is expired, the timestamp could be outside of TTL but still queryable. I don't think we can enforce TTL strictly during ingestion.

rohityadav1993 commented 2 weeks ago

@Jackie-Jiang , I might not have phrased this clearly, updating the description. The filtering is needed on the query side but the TTL is on the timestamp that is associated with a row's column(provided by upstream).

Jackie-Jiang commented 1 week ago

I see. Does it work if we build a feature to automatically add a filter: where timeCol > currentTime - TTL where TTL is configurable? Time column is already configured for most of the tables, and it is used to manage retention

rohityadav1993 commented 1 week ago

This is more of an expiry time and TTL duration is not provided. E.g. use case: a bid expires on 1 Jan 2024 00:00:00. This time column is independent of event time which is used for retention management. So a filter like this will be useful where rowExpiryTime > currentTime

Jackie-Jiang commented 1 week ago

I see. So basically you want to apply TTL on any date time field, where Pinot can automatically apply a filter?

rohityadav1993 commented 5 days ago

Yes, precisely this. I think any other solution where we try to delete the row at expiry time would be too complex to implement so having a filter and then either from minion compaction or externally invoking delete can be a simpler solution.