Closed velvia closed 7 years ago
Now that we've switched to FiloMemTable, need a fast in-memory off-heap storage that can also be persisted. Some options:
Chronicle-java
@parekuti here are some guidelines for the write-ahead log implementation for the memtable.
FiloAppendStore
and FiloMemTable
such that it could be restored if a crash happensFiloAppendStore
to diskFiloAppendStore
decides to rewrite the most current chunk, this must be handled (instead of appending new chunks, it replaces most recently appended chunks)If a crash happens, the on disk file must restore all the state of the FiloAppendStore as well as the partSegKeyMap in the FiloMemTable
. However, the thought is that the partSegKeyMap does not need to be preserved on disk because the partition and segment keys for each row could be recovered from the chunks themselves.
At a higher level, we must be able to restore the state of all the active NodeCoordinatorActors. Thus, the active and flushing memtables; for each NodeCoordinatorActor, the dataset, version, and ingestion schema / columns. This needs to be persisted somewhere.
While the FiloMemTable
already uses binary Filo chunks, we still need some file format for containing the chunks. So this is a proposal for the format.
The file header consists of the following bytes. The + signifies an offset in hex. Everything is written little endian.
DataColumn.toString
for each column, UTF8-encoded / written using DataWriter.write(string)directory structure:
${memtable-wal-dir} / $dataset_$version / $timestamp.wal
Need to store datasets being written somewhere
@parekuti is working on this issue, but for some reason cannot assign this issue to her.
The PR for this has been merged.
This could be as simple as a one liner to enable it in the current MapDBMemTable, plus recovery logic. However, need to benchmark as the MapDBMemTable is already not fast.