4paradigm / OpenMLDB

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
https://openmldb.ai
Apache License 2.0
1.58k stars 317 forks source link

optimizer & runner should consider `NULL` special rules #3925

Open aceforeverd opened 4 months ago

aceforeverd commented 4 months ago

For a index optimizer, the most common pattern is, for example, SQL SELECT * FROM t1 WHERE gp = {const_val}, is optimized based on FILTER expression gp = {const_val}:

So if index key=gp exists for table t1, that index is utilized, as SEEK BY KEY {const_val} >> OUTPUT ALL ROWS`.

However, if {const_val} is NULL, this optimize rule should not go that approach, since <ANY_VAL> = NULL is NULL.

This also happens for SQL like LAST JOIN t1 LAST JOIN t2 on t1.key = t2.key, where there is row exists in t1 that t1.key = NULL. It never joins.

Optimizer should consider const NULL values, and Runner should consider NULL values as index_key

aceforeverd commented 4 months ago

relate issue #3685

aceforeverd commented 4 months ago

some of the yaml testing cases are wrong: