facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.48k stars 1.14k forks source link

Extend expression fuzzer to compare Velox result against Presto #10308

Open kagamiori opened 4 months ago

kagamiori commented 4 months ago

Description

Today, expression fuzzer compares the results of an expression from the regular expression evaluator to the results from a simplified expression evaluator. However, both evaluators are implemented in Velox and uses Velox functions, so their results can be both wrong when some functions in Velox have different behaviors from these functions in Presto. Ideally, we would like to compare Velox result against Presto for expression fuzzer.

kagamiori commented 3 weeks ago

https://github.com/facebookincubator/velox/pull/11134 adds the basic support for expression fuzzer with PrestoQueryRunner, with the following limitations.

  1. Not testing arithmetic and comparison operators, e.g., +, -, <, >. Extension in PrestoQueryRunner is needed to convert CallTypedExpr with the function name "plus" into SQL expressions that use the "+" operator.
  2. Not tested with dereference yet. Need to enable dereference and ensure the fuzzer works correctly.
  3. Not support literals of JSON, Timestamp, Interval, and complex types. JSON literals needs to be converted to json_parse(json '...') in Velox because Presto implicitly invoke json_parse. Interval and complex-typed literals needs to be constructed using datetime functions and array, map, or row constructors in the SQL text. Timestamp literals require more investigation to ensure Presto gets the same timestamp as Velox.
  4. Not support AND/OR special forms. This requires the query optimizer through C++ sidecar in Presto coordinator or bypassing the Presto expression optimizations. More details can be found in https://github.com/prestodb/presto/issues/23402.
  5. The fuzzer currently ignores exceptions from reference DB when default-null behavior happens in Velox. We need to extend expression fuzzer to retry with only the non-null rows in this situation (similar to the idea of retry-with-try).