diennea / herddb

A JVM-embeddable Distributed Database
https://herddb.org
Apache License 2.0
316 stars 46 forks source link

DataStorageManager BookKeeper/DistributedLog #319

Closed eolivelli closed 4 years ago

eolivelli commented 5 years ago

We could have an implementation of DataStorageManager which saves data and index pages on BookKeeper, so it would not use the local disk.

In this mode each server node won't have data stored locally (except from configuration and tmp files), but it would become a single "stateless" container.

If you need to read and write from BookKeeper data pages then it will involve RPCs so it is better that most of the tablespace can be kept in memory.

We are seeing some usages of HerdDB as simple containers for configuration and little data in sites which already have a BookKeeper cluster. In this cases having the ability not to rely on local disk for storing persistent data would be very appreciated.

A naive implementation may use BookKeeper (or better DistributedLog which is an higher level abstract over BookKeeper) in order to store data pages and index pages, the mapping between page ids and ledgers should be kept in memory and dumped to other BookKeeper ledgers in case of checkpoint. The pointer to the ledger which contains the mapping between the pages and the ledger will be saved to ZooKeeper.

So: 1) Data pages are written to BK and pointed by PAGE-POINTERs 2) The List of active PAGE-POINTERs (LAPP) is held in memory 3) At checkpoint you write the LAPP to a BK ledger and keep track of this new ID in a Z-node 4) On recovery you read from ZK the id of the ledger which contains the LAPP 5) Then read the LAPP in memory

eolivelli commented 5 years ago

@diegosalvi @aluccaroni what do you think ?

diegosalvi commented 5 years ago

It could be very useful.

Are you thinking about a mapping 1to1 ledger <-> page?

eolivelli commented 5 years ago

@diegosalvi yes, it will make things simpler. The only big change is that TableManager is generating the id page, in order to support this new feature we need the DataStorageManager to generate the page id.

Temporary pages can be swapped to local disk, they are not part of the checkpoint, we have to store on DL only data for recovery

diegosalvi commented 5 years ago

Just a micro note to remember while implementing this issue: Tables offload pages when memory is needed but such offload shoudl/could go to local disk and not to DL due to their inherent "temporary" nature (but we must have some way to track them to copy back on DL while on checkpoint)

eolivelli commented 4 years ago

Implemented with diskless-cluster mode