BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

Unit tests for transform_json_tree #1092

Closed wmalpica closed 4 years ago

wmalpica commented 4 years ago

We want to create some unit tests for the transform_json_tree function in PhysicalPlanGenerator.h

The unit tests should be of the form:

std::string json = .... This is what the input to the test is.
std::istringstream input(json);
boost::property_tree::ptree p_tree;
boost::property_tree::read_json(input, p_tree);
transform_json_tree(p_tree);
std::ostringstream output; // not sure if this is the right syntax
boost::property_tree::write_json(output, p_tree);
// Validate output

The tests should be:

  1. A regular relational algebra for a query with two joins
  2. Same as 1, but with no indentation in the input string
  3. Same as 2, but with an extra empty line at the end

1 should work. 2 will hopefully work. I believe there may be an issue with situation 3. We should fix the code so that this is not an issue.

  1. We should run a test on a relational algebra that has a window function (not yet supported). And the validation should be that it should catch a thrown error indicating the relational algebra step that is not supported. This likely does not currently work and needs to be fixed.

A relational algebra plan for a query with a window function would look like this:

LogicalProject(o_custkey=[$0], o_orderpriority=[$3], EXPR$2=[CASE(>($4, 0), $5, null:DOUBLE)])
LogicalWindow(window#0=[window(partition {0} order by [2] rows between $4 PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])
BindableTableScan(table=[[main, orders]], projects=[[1, 3, 4, 5]], aliases=[[o_custkey, o_totalprice, o_orderdate, o_orderpriority]])
diegodfrf commented 4 years ago

Add tests for item 1, 4 in json_transform_tree in ral Add tests for item 2, 3 in pyblazing with pytest Add run pytest in test.sh Modularize conversion code from Logical Plan Optimizen to Json, add module in pyblazing algebra

Add support for:

e.g. Now is valid

// Base case
LogicalJoin(condition=[=($6, $0)], joinType=[left])
  LogicalJoin(condition=[=($3, $1)], joinType=[left])
    LogicalTableScan(table=[[main, product]])
    LogicalTableScan(table=[[main, client]])
  LogicalTableScan(table=[[main, preference]])\n
// Without \n end logical plan
LogicalJoin(condition=[=($6, $0)], joinType=[left])
  LogicalJoin(condition=[=($3, $1)], joinType=[left])
    LogicalTableScan(table=[[main, product]])
    LogicalTableScan(table=[[main, client]])
  LogicalTableScan(table=[[main, preference]])
// Multiple \n
LogicalJoin(condition=[=($6, $0)], joinType=[left])
  LogicalJoin(condition=[=($3, $1)], joinType=[left]) \n\n\n\n
    LogicalTableScan(table=[[main, product]])
    LogicalTableScan(table=[[main, client]]) \n\n\n\n
  LogicalTableScan(table=[[main, preference]])\n
// Empty lines
LogicalJoin(condition=[=($6, $0)], joinType=[left])

  LogicalJoin(condition=[=($3, $1)], joinType=[left])

    LogicalTableScan(table=[[main, product]])
    LogicalTableScan(table=[[main, client]])

  LogicalTableScan(table=[[main, preference]])\n
// End with one or multiples spaces or tabs
LogicalJoin(condition=[=($6, $0)], joinType=[left])                // end of line spaces
  LogicalJoin(condition=[=($3, $1)], joinType=[left])\t\t\t\t\t    // end of lines tabs
    LogicalTableScan(table=[[main, product]])     \t     \t        // end of line mix of spaces and tabs
    LogicalTableScan(table=[[main, client]])
  LogicalTableScan(table=[[main, preference]])\n
// Differente type indentation
LogicalJoin(condition=[=($6, $0)], joinType=[left])                
\tLogicalJoin(condition=[=($3, $1)], joinType=[left])               // detect one tab as indentation by default. Level 1
\t\tLogicalTableScan(table=[[main, product]])                       // level 2
\t\tLogicalTableScan(table=[[main, client]])                        // level 2
\tLogicalTableScan(table=[[main, preference]])\n                    // level 1

LogicalJoin(condition=[=($6, $0)], joinType=[left])                
\t \tLogicalJoin(condition=[=($3, $1)], joinType=[left])            // detect '\t \t' (tab space tab) as indentation by default Level 1
\t \t\t \tLogicalTableScan(table=[[main, product]])                 // level 2
\t \t\t \tLogicalTableScan(table=[[main, client]])                  // level 2
\t \tLogicalTableScan(table=[[main, preference]])\n                 // level 1

LogicalJoin(condition=[=($6, $0)], joinType=[left])                
   LogicalJoin(condition=[=($3, $1)], joinType=[left])              // detect '   ' (three spaces) as indentation by default
   \tLogicalTableScan(table=[[main, product]])                      // invalid indentation, throw exception 
      LogicalTableScan(table=[[main, client]])
   LogicalTableScan(table=[[main, preference]])\n