Closed yywe closed 2 weeks ago
Yep, you're right, nice find!
toydb> create table a (id int primary key, value string not null);
Created table a
toydb> create table b (id int primary key, value string not null);
Created table b
toydb> insert into a values (1, 'a'), (2, 'b');
Created 2 rows
toydb> insert into b values (1, 'b'), (2, 'b');
Created 2 rows
toydb> select a.id, b.id from a join b on a.value = b.value;
2|2
toydb> explain select a.id, b.id from a join b on a.value = b.value;
Projection: a.id, b.id
└─ HashJoin: inner on a.value = b.value
├─ Scan: a
└─ Scan: b
encountered this while playing around a little and wanted to to share in case it was helpful. I believe that this test should actually return duplicate results e.g.
Result: ["id", "title", "genre", "studio", "rating"]
[Integer(10), String("Inception"), String("Science Fiction"), String("Warner Bros"), Float(8.8)]
+[Integer(10), String("Inception"), String("Science Fiction"), String("Warner Bros"), Float(8.8)]
+[Integer(1), String("Stalker"), String("Science Fiction"), String("Mosfilm"), Float(8.2)]
[Integer(1), String("Stalker"), String("Science Fiction"), String("Mosfilm"), Float(8.2)]
[Integer(4), String("Heat"), String("Action"), String("Warner Bros"), Float(8.2)]
+[Integer(4), String("Heat"), String("Action"), String("Warner Bros"), Float(8.2)]
+[Integer(6), String("Solaris"), String("Science Fiction"), String("Mosfilm"), Float(8.1)]
[Integer(6), String("Solaris"), String("Science Fiction"), String("Mosfilm"), Float(8.1)]
[Integer(7), String("Gravity"), String("Science Fiction"), String("Warner Bros"), Float(7.7)]
+[Integer(7), String("Gravity"), String("Science Fiction"), String("Warner Bros"), Float(7.7)]
+[Integer(9), String("Birdman"), String("Comedy"), String("Warner Bros"), Float(7.7)]
[Integer(9), String("Birdman"), String("Comedy"), String("Warner Bros"), Float(7.7)]
[Integer(5), String("The Fountain"), String("Science Fiction"), String("Warner Bros"), Float(7.2)]
+[Integer(5), String("The Fountain"), String("Science Fiction"), String("Warner Bros"), Float(7.2)]
(also just to reiterate, thanks for open sourcing this! it's great fun to play with)
This should be addressed now, but I'm going to have a closer look at this logic and add more test cases in a while.
Hi @erikgrinaker,
Thank you for open source this great project. I would like to say this is the best resource to learn database principles. Meanwhile, the code quality is awesome, neat and elegant. Every piece is excellent.
I have a question for one of the optimizer, as shown below:
when it is equal join, here we switch to hash join. based on Field(a) and Field(b). Not sure if I miss anything, but what if the corresponding field (column) are not unique ? if there are duplicate values in the fields. I feel the hash join will miss some rows?
The hash map is created by collect the key value pairs:
let right: HashMap<Value, Row> = rrows .map(|res| match res { Ok(row) if row.len() <= r => { Err(Error::Internal(format!("Right index {} out of bounds", r))) } Ok(row) => Ok((row[r].clone(), row)), Err(err) => Err(err), }) .collect::<Result<_>>()?;
if there are duplicate values, I think only the last pair will exist.
do I miss anything? Appreciate any feedback. Thanks!