Questions about storage and indices

can-keklik commented 1 year ago

Hello @erikgrinaker!

Thanks for this amazing project. It actually inspired me to create my own toy database as well. I'm also following Designing Data-Intensive Applications and Database Internals. I'm not sure if you have time to answer them, yet here are the questions:

Thanks in advance,

I recently read about LSM from both of the books, now I am confused about your storage engine implementations. They are named as log and KV in ToyDB. To me, "log" is probably an LSM but, what is KV then?
As far as I understood, all the entries are stored in a KV pairs such as "{table_id}_{primary_id}_{col}": row[col]. Is that correct?
How did you implement SQL-level indexes? Separate B-trees? I couldn't locate index creation logic inside the codebase.

erikgrinaker commented 1 year ago

Hi @can-keklik, glad you like the project!

I recently read about LSM from both of the books, now I am confused about your storage engine implementations. They are named as log and KV in ToyDB. To me, "log" is probably an LSM but, what is KV then?

Log is just an append-only file with an in-memory buffer for uncommitted data, specialized for the Raft log. KV is an in-memory B+ tree that's reconstructed from the Raft log. These are both rather trivial toy implementations. I've wanted to implement real on-disk state storage for some time, but am too busy with other stuff -- I'd probably go with something like Bitcask, which is kind of a very simplified version of an LSM.

As far as I understood, all the entries are stored in a KV pairs such as "{tableid}{primaryid}{col}": row[col]. Is that correct?

No, the key is ${table_name}${primary_key} using a custom, order-preserving encoding for the primary key as specified here:

https://github.com/erikgrinaker/toydb/blob/5b72c04fd6e58c31bbac6e9c7adfd1780398ed34/src/storage/kv/encoding.rs#L1-L10

The value in the KV store is a vector of column values encoded using Bincode.

How did you implement SQL-level indexes? Separate B-trees? I couldn't locate index creation logic inside the codebase.

SQL indexes are stored in the same B-tree as the other data, using the key format ${table_name}${column_name}${index_value}, where the value in the KV store is a HashSet of primary key references encoded using Bincode.

Because ToyDB doesn't support schema changes, indexes must be defined when the table is created. There is therefore no index building code as such, since all new tables are created empty, but the indexes must be updated when data is written to the primary table. For example, when inserting a row into a table, the corresponding index entries are written here:

https://github.com/erikgrinaker/toydb/blob/5b72c04fd6e58c31bbac6e9c7adfd1780398ed34/src/sql/engine/kv.rs#L134-L139

can-keklik commented 1 year ago

Thanks for the detailed answer.

OK. So you’re just using a log for uncommitted data, and a B+ tree as the primary storage. Probably “Log” confused me here.
So columns of a single row are stored together, rather than separate key-value pairs. I'd just thought that it might be easier to update cells when each column is separate pair, but I guess that’s not trivial or even feasible to manage, in terms of lookups/reads.
Oh, makes sense. Thanks for the clarification as well.

erikgrinaker / toydb

Questions about storage and indices #55