optimizer & runner should consider `NULL` special rules

aceforeverd commented 6 months ago

For a index optimizer, the most common pattern is, for example, SQL SELECT * FROM t1 WHERE gp = {const_val}, is optimized based on FILTER expression gp = {const_val}:

So if index key=gp exists for table t1, that index is utilized, as SEEK BY KEY {const_val} >> OUTPUT ALL ROWS`.

However, if {const_val} is NULL, this optimize rule should not go that approach, since <ANY_VAL> = NULL is NULL.

This also happens for SQL like LAST JOIN t1 LAST JOIN t2 on t1.key = t2.key, where there is row exists in t1 that t1.key = NULL. It never joins.

Optimizer should consider const NULL values, and Runner should consider NULL values as index_key

aceforeverd commented 6 months ago

relate issue #3685

aceforeverd commented 6 months ago

some of the yaml testing cases are wrong:

https://github.com/4paradigm/OpenMLDB/blob/d9bb344d11354bb53d2f8ba7f77daf85e9c78917/cases/function/join/test_lastjoin_complex.yaml#L560-L655

4paradigm / OpenMLDB

optimizer & runner should consider `NULL` special rules #3925