apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.56k stars 3.54k forks source link

[Gandiva] Add a string based expression parser #19778

Open asfimport opened 6 years ago

asfimport commented 6 years ago

Gandiva currently supports a tree-based expression builder. This requires writing a lot of code for even simple expressions.

For eg. to build an expression for "a + b < 10", the code is :


   // schema for input fields
  auto field0 = field("a", int32());
  auto field1 = field("b", int32());
  auto schema = arrow::schema({field0, field1});

  // output fields
  auto field_result = field("res", boolean());

  // Build expression
  auto node_f0 = TreeExprBuilder::MakeField(field0);
  auto node_f1 = TreeExprBuilder::MakeField(field1);
  auto literal_10 = TreeExprBuilder::MakeLiteral(10);
  auto sum_expr =
      TreeExprBuilder::MakeFunction("add", {node_f0, node_f1}, int32());
  auto lt_expr =
      TreeExprBuilder::MakeExpression("less_than", {sum_expr, literal_10}, field_result);

An alternate way to do this would be :

 


// Build expression
auto expr = StringExprBuilder::MakeExpression(schema, "a + b < 10", field_result);

The expression syntax should be close to that of SQL.

 

To begin with, this'll simplify writing tests. And, it will provide an easier api to work with gandiva.

Reporter: Pindikura Ravindra / @pravindra

Note: This issue was originally created as ARROW-3458. Please see the migration documentation for further details.

asfimport commented 6 years ago

Wes McKinney / @wesm: cc @cpcloud @pitrou in case of interest

asfimport commented 6 years ago

Praveen Krishna / @Praveen2112: So for creating a string based expression parser can we use antlr for parser generation ?

Can we have two separate implementations both in C and Java or Have an implementation in C and a Java binding for the same ?

@pravindra @wesm Your insights on this ?

asfimport commented 6 years ago

Pindikura Ravindra / @pravindra:

So for creating a string based expression parser can we use antlr for parser generation ?

yes.

Can we have two separate implementations both in C and Java or Have an implementation in C and a Java binding

for the same ?

IMO - If we do in C++ first, it'll be usable from c and python too.

asfimport commented 2 years ago

Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.