SeaQL / otter-sql

🦦 An Embeddable SQL Executor in Rust
Apache License 2.0
22 stars 1 forks source link

How should aggregate work in the VM #16

Open tyt2y3 opened 2 years ago

tyt2y3 commented 2 years ago

Reading through some existing code, still want to start a discussion.

https://github.com/SeaQL/sql-assembly/blob/e38f564785a6ba8a3542d512690ef90a059274d4/src/ic.rs#L373-L427

  1. I think aggregate should be like project, it has a source table and a destination table, and some expressions to evaluate. It's just the evaluation model is different

  2. In essence, how 'group by' works conceptually: for each row (there may be multiple group by columns), construct a tuple from the values of that row. Use that tuple as the key in a 'hash map' (it should be our own table + index implementation), where we run the 'reduce' function for each collision.

  3. we might be missing an Aggregate instruction here, and the expressions to evaluate

  4. The having construct in SQL is way too powerful, for example, we can do HAVING MAX(col3) + 1 > 10, where we have to recognize that MAX(col3) is an expression already evaluated, and we still have to evaluate the +1 part. It might be that our eager execution model does not align too well with SQL. Anyway we can leave this problem for later. May be right now we only allow a binary operator to be used as having clauses and the left operand must match one of the aggregate expressions

Or is there some simpler way to implement this?