chaisql / chai

Modern embedded SQL database
MIT License
1.56k stars 95 forks source link

Building a query planner #88

Closed asdine closed 4 years ago

asdine commented 4 years ago

Currently, there is one separate type for each kind of statement, containing the entire logic of that statement. The parser only needs to determine which type to instantiate and give all the necessary information to that object so that it can be run.

There are several problems with this approach:

Having a better designed base would allow us to:

Proposal

To solve that, we can build an intermediary representation of a query in the form of a tree. A tree represents a stream of documents, from an input (table, index, etc) to an output (result stream, deletion from the table, update, etc.).

Each node of the tree represents an operation that will transform the stream (filter documents, project fields, renaming fields, etc.). These operations will follow some principles of relational algebra [1] [2].

With this representation, the difference between a SELECT and an UPDATE is only about changing one node of the tree.

The tree would be built by the parser before being passed to the execution phase.

Once we have this tree, we can optimize it using Rule-based optimization: Each rule is a function that takes one tree and returns a new tree. Rule-based optimization can be summarized like this:

t is a tree
r1, r2, ... rN are functions with this pseudo-signature r(t tree) tree

t -> r1() -> r2() -> ... -> rN() -> finalTree

Each rule does only one optimization:

Once the tree is optimized, we can transform it into an actual document.Stream that can be executed.

Some reads about relational algebra: [1] http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/ra.pdf [2] https://www.geeksforgeeks.org/introduction-of-relational-algebra-in-dbms/

yaziine commented 4 years ago

[1] http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/ra.pdf

Slide 4 says that for union, intersection and difference both operands should have the same schema. What is your thought on this? How could we handle this since we could have schema-less relations.

asdine commented 4 years ago

This is a good point, PR #87 won't add union, intersection, and difference, so we don't need to ask ourselves this question for the moment. This will be taken care of during the next iterations. I will probably write a proposal when we will reach that point