SeaQL / otter-sql

🦦 An Embeddable SQL Executor in Rust
Apache License 2.0
22 stars 1 forks source link

Expr evaluator #19

Closed Samyak2 closed 2 years ago

Samyak2 commented 2 years ago

PR Info

Adds

Fixes

Breaking Changes

Changes

tyt2y3 commented 2 years ago

For the scope of this project I think we only need a basic evaluator, where the most important operators might be those that produce boolean values in the end, i.e. A > B, A = B etc such that we can effectively test the executor.

Samyak2 commented 2 years ago

I know the evaluation of boolean expression is still on the way.

A questions comes into my mind. TRUE and FALSE are essentially 1 and 0, or I should say any non-zero and zero.

Hence, the following queries are equivalent. Will we even cast integer / float as boolean value implicitly? Or, we only accept boolean value / expression only, else throwing error / panic?

select * from tbl where TRUE and FALSE;
select * from tbl where 1 and 0;
select * from tbl where -1 and 0;
select * from tbl where 1.01 and 0.0;

I was thinking of supporting booleans (or expressions that evaluate to booleans) only. Having "truthy" and "falsy" values has been a common footgun for me in JavaScript and Python, so I wouldn't want to emulate that. But of course, we aren't making a new language here, we should emulate what other DBs do.

The MySQL documentation doesn't specify whether it only supports booleans - I will have to test it out. The PostgreSQL docs says any expression that evaluates to a result of type boolean - so it is consistent with my idea.

Samyak2 commented 2 years ago

The comparison and equality operators are currently all derived, but I think we should impl it ourselves and return explicit errors in case of non-matching types.

Note: the logical AND and OR will be implement in the Expr executor as short circuiting needs to be implemented for it.

Samyak2 commented 2 years ago

Actually, I think the derived implementations are good enough for now.

tyt2y3 commented 2 years ago

Cool

tyt2y3 commented 2 years ago

Do you think it's good to merge. Or do you still have something to add?

Samyak2 commented 2 years ago

Not yet. I'm working on the tests, but two things are yet to be implemented - ColumnRef and Function.

The problem with ColumnRef is that it needs to be in the context of a Row to return a Value. I think it would be best to handle both of these in the overall VM executor instead of this - since it needs a Row as well the table schema to know which index in the Row it needs to return. What do you think?

billy1624 commented 2 years ago

Not yet. I'm working on the tests, but two things are yet to be implemented - ColumnRef and Function.

The problem with ColumnRef is that it needs to be in the context of a Row to return a Value. I think it would be best to handle both of these in the overall VM executor instead of this - since it needs a Row as well the table schema to know which index in the Row it needs to return. What do you think?

I'll put Function aside and focus on ColumnRef first.

I think Expr::execute() method should be responsible to evaluate a expression and return a single Value. (this is the current design) The "context" (db table & row) comes into play when we're evaluating ColumnRef.

  1. There are some missing piece during the generating of intermediate code. ColumnRef should contains the RegisterIndex where the data of a table / projection located.
  2. Expr::execute() method should also takes a row_id indicating the current row. This allows ColumnRef expression to be evaluated as a single Value (consider a table with rows and columns; the column is given in the ColumnRef and we got the row_id as well).

This way, VM iterate and evaluate expressions row by row then append result to the destination register.

Does this make sense? @Samyak2 @tyt2y3

tyt2y3 commented 2 years ago

Yeah unless I missed some details, we should pass the row and all the row's data to the evaluator, along with the column metadata.

Such that when we say we want 'id = 1' it will just get the value from the row, and evaluate the boolean expression.

Yes we can forget about function for now.

billy1624 commented 2 years ago

I think the Function is not part of Expr. It should be a seperate struct / enum where it contains two pieces of data: the function operator (e.g. MAX, MIN, COUNT, IS_NULL) and the expression (where it can be evaluated as Value). However, I'm not sure to represent this on the intermediate code level. A special kind of projection?

tyt2y3 commented 2 years ago

I don't think function belongs to the instruction set. All the IC care is execute a lambda and get some value.

The expression evaluation doesn't care about function either, it's just calling a lambda and get some Value back.

tyt2y3 commented 2 years ago

It could be a rabbit hole if we dig down too far. Let's just do something passable for now and revisit expression evaluation later . The IC and VM is what matters more for now because I basically need to see a POC of the execution pipeline, from SQL parsing to query results. And iterate from there filling in the details.

Samyak2 commented 2 years ago

I have implemented a basic execution of ColumnRef for now. It needs more tests which I will wrap up tomorrow.

Samyak2 commented 2 years ago

I'm working on the overall executor in a different branch. @tyt2y3 can we merge this?

tyt2y3 commented 2 years ago

Nice