adamfowleruk / groundupdb

Creating a database from the ground up in C++ for fun!
Apache License 2.0
123 stars 25 forks source link

Add support for Buckets #7

Closed adamfowleruk closed 4 years ago

adamfowleruk commented 4 years ago

[Who] As a database user [What] I want to be able to logically segment my data [Value] To make storage, querying, retrieval, backups and restores easier and quicker

[Who] As a database administrator [What] I want to be able to make use of the CPU and I/O parallelism features of my DB servers by partitioning data [Value] In order to provide the highest performance for write, read, and query workloads to my database users

In order to achieve the above we need to implement Buckets, Forest, and Stands. Forests are a physical unit of data management (threading) within the database, transparent to the database users. Stands are individual record sets within each Forest. In an append only file or MVCC layout, stands are self contained and a forest has many stands that may be merged over time.

Forests provide parallelisation. Buckets provide a logical separation of records to be queried. Ideally, a record could be part of more than one bucket. There could also be a default (blank) bucket.

For now in our single threaded database we will have one Forest, and one stand within that forest. We will have one stand file for key hashes and their management data, with a hash pointer to a key-value entry. A separate key-value file will hold the values.

The concept of a bucket will be implemented as a term list index value. This index with be an append only file model too. A key may have a bucket specified for setting, but not getting. A scan for keys will be added which allows the specification of a bucket.

We may implement a custom hashing algorithm for unordered_map that matches our physical storage hashing.

adamfowleruk commented 4 years ago

Only implementing bucket storage, custom hashing, list storage (including term lists) for now. Sharding comes later.