apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.51k stars 1.29k forks source link

Time Pruner should support queries where we're operating on the time column itself #14362

Open jadami10 opened 1 week ago

jadami10 commented 1 week ago

Today, we only apply time pruning when the left hand side of an operand is strictly the time colum, https://github.com/apache/pinot/blob/master/pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/TimeSegmentPruner.java#L370-L373

But we should still be able to apply time pruning when an operand is applied to the time column. The main case we've seen is folks will run a query with where date_trunc('day', <ts_column>) < <static_time>. This causes the time pruner to not be used.

The most naive way here would be to rebuild the interval tree in the broker with the min/max time columns for each segment also truncated. Though this would be problematic for tables with tens of thousands of segments.

I'm opening this as an issue for now to brainstorm if there's a universal solution or if there's easy operands that can be supported.

Jackie-Jiang commented 1 week ago

We do have TimePredicateFilterOptimizer introduced in #6957. We should be able to enhance that to also support DATE_TRUNC

jadami10 commented 1 week ago

great link, ty jackie. I won't have time to work on this immediately (our user also just modified their query instead), but I'll try to come back to it in free time

ashishjayamohan commented 1 week ago

@jadami10 Mind if I take a stab at this?

jadami10 commented 1 week ago

absolutely! feel free to tag me in to review