cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

No informative GetInfo() method for operator nodes and some plan nodes #1369

Closed chenboy closed 6 years ago

chenboy commented 6 years ago

This is a little bit different from what we discussed in the meeting, I'll explain in the description.

There's no GetInfo() method for operator nodes in the optimizer, which makes debugging the optimizer extremely painful.

https://github.com/cmu-db/peloton/blob/master/src/include/optimizer/operators.h#L52

There are some difficulties implementing it I can think of. First is operators are stored in the memo, so we need to construct an operator tree when we want to print it. I think printing the best operator tree in the memo should be good enough for most cases. We could take the GetBestPlan method as a reference.

https://github.com/cmu-db/peloton/blob/master/src/optimizer/optimizer.cpp#L288

Printing memo at arbitrary time during optimization would be helpful when we implement more complex optimizations

Another nice-to-have feature is to print output columns for each operator, e.g. t1.a + AVG(t2.b), as it's also something we could easily mess up. The challenge is these output columns are not stored in the operators. They are constructed on the fly and stored in a map when constructing the plan.

https://github.com/cmu-db/peloton/blob/master/src/include/common/internal_types.h#L1392

I previously thought we could print these in the plan node, but we only store output column offset as oid in the plan node, which is not the most intuitive debugging info.

I think we should print out predicates, e.g. scan predicates, join predicates, having clauses, in the plan node. I've added some already. Please check if there's anything left.

By the way, GetInfo() method for expressions is also kind of crappy, it's not succinct enough, an expression t1.a + AVG(t2.b) spans multiple lines with a lot of redundant information. We should also fix it.

To summarize, the features needed to be implemented are:

  1. GetInfo() for the best operator tree in the optimizer
  2. Informative GetInfo() for plan nodes
  3. Succinct GetInfo() method for expressions

The features that are nice to have:

  1. GetInfo() for memo
  2. Printing output columns with operator tree nodes.
apavlo commented 6 years ago

I will assign this to the incoming PKU intern.