filodb / FiloDB

Distributed Prometheus time series database
Apache License 2.0
1.43k stars 226 forks source link

MemTable WAL implementation #42

Closed velvia closed 7 years ago

velvia commented 8 years ago

This could be as simple as a one liner to enable it in the current MapDBMemTable, plus recovery logic. However, need to benchmark as the MapDBMemTable is already not fast.

velvia commented 8 years ago

Now that we've switched to FiloMemTable, need a fast in-memory off-heap storage that can also be persisted. Some options:

Chronicle-java

https://github.com/xerial/larray

velvia commented 8 years ago

@parekuti here are some guidelines for the write-ahead log implementation for the memtable.

Requirements

If a crash happens, the on disk file must restore all the state of the FiloAppendStore as well as the partSegKeyMap in the FiloMemTable. However, the thought is that the partSegKeyMap does not need to be preserved on disk because the partition and segment keys for each row could be recovered from the chunks themselves.

At a higher level, we must be able to restore the state of all the active NodeCoordinatorActors. Thus, the active and flushing memtables; for each NodeCoordinatorActor, the dataset, version, and ingestion schema / columns. This needs to be persisted somewhere.

Write-Ahead Log File Format

While the FiloMemTable already uses binary Filo chunks, we still need some file format for containing the chunks. So this is a proposal for the format.

File Header

The file header consists of the following bytes. The + signifies an offset in hex. Everything is written little endian.

velvia commented 8 years ago

directory structure:

${memtable-wal-dir} / $dataset_$version / $timestamp.wal

Need to store datasets being written somewhere

velvia commented 8 years ago

@parekuti is working on this issue, but for some reason cannot assign this issue to her.

velvia commented 7 years ago

The PR for this has been merged.