apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.41k stars 1.27k forks source link

[multistage] Handle Comparisons Against Extremal Values #10523

Open ankitsultana opened 1 year ago

ankitsultana commented 1 year ago

In V1 engine, we do not allow comparisons such as longCol < Long.MIN_VALUE. Example

This works fine for V1 queries, but in V2 engine Calcite can re-write filters. For example: a query with filters like colName != Long.MIN_VALUE or colName < 10 gets translated to (colName < Long.MIN_VALUE) OR (colName > Long.MIN_VALUE AND colName < 10.

This can lead to failures with errors such as follows:

Caused by: java.lang.IllegalArgumentException: Invalid range: colName < '-9223372036854775808'
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:210)
    at org.apache.pinot.core.operator.filter.predicate.RangePredicateEvaluatorFactory$LongRawValueBasedRangePredicateEvaluator.<init>(RangePredicateEvaluatorFactory.java:386)
    at org.apache.pinot.core.operator.filter.predicate.RangePredicateEvaluatorFactory.newRawValueBasedEvaluator(RangePredicateEvaluatorFactory.java:85)
    at org.apache.pinot.core.operator.filter.predicate.RangePredicateEvaluatorFactory$UnsortedDictionaryBasedRangePredicateEvaluator.<init>(RangePredicateEvaluatorFactory.java:273)

Some possible solutions:

  1. Add support for these filters in V1 engine itself.
  2. Add a rule in v2 optimizers to detect tautologies. The rule could be enhanced in the future to add more cases.

cc: @walterddr

abhioncbr commented 1 year ago

I want to give it a try if it's not urgent. Thanks

abhioncbr commented 1 year ago

@ankitsultana / @walterddr I am planning to start work on this. Just checking, do you guys think it's good to start work and not be dependent on any other work? Thanks

ankitsultana commented 1 year ago

Hi @abhioncbr , thanks for showing interest.

At this point we are not sure what's the right approach for this. Both of the approaches have pros and cons. Would be best to wait for now.