Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
472 stars 59 forks source link

Predicate Compiler for Evaluators #99

Open omervk opened 5 years ago

omervk commented 5 years ago

Predicates are currently created as ASTs, but are not compiled or optimized. Instead, evaluating them means traversing the tree in its entirety, something that, in a tight loop (like going over partitioning functions for all files in a table), is suboptimal.

Compiling the expressions into a minimal syntax tree that embeds the right-hand side values would let us optimize it ahead of time in several ways, including:

  1. Compressing equals-or-equals trees to in.
  2. Dropping always-true and always-false expressions entirely.

This issue therefore includes three things:

  1. Representing expressions as a value-embedded AST.
  2. Creating an optimizer framework.
  3. Slowly building more and more optimizations.

This could probably be done using existing tools, without reinventing the wheel :)