Previously dismissed at #222, but it is nice to be able to rescue archives quickly during disaster recovery, especially when blkar needs to scan multi TB worth of data. Sort mode is expected to be the next step of DR process, so would be good to upgrade as well.
Things to be upgraded
[x] rescue
The collected blocks can be written in batch and grouped by UIDs to reduce opening and closing of file writers
The stats should only be updated by writer thread after writing is completed. This is to ensure logged progress in the log file is accurate.
Upgrade completed, roughly 4x the performance (300% increase), going from ~50MB/s to ~200MB/s on my laptop
[x] sort
May need to add a read position field to DataBlockBuffer as sort mode needs to keep track of the number of blocks that were in different orders
Upgrade completed, roughly ~1.5x~ 2.75x the performance (~50%~ 175% increase), going from ~80MB/s to ~\~120MB/s~ ~220MB/s on my laptop
Turns out using reader.cur_pos() is really slow, swapped to calculating current reader position using seek_to and bytes_processed to remove the bottleneck. Hence the further increase in performance gain, making it on par with rescue mode.
Things to not be upgraded
repair
To do in-place edits of a file, it means the a single reader is used with write enabled, but this means reader needs to be locked for both reading and writing operation, negating the benefits from pipelining via actor model. In other modes, the reader and writer point to different files, so there are no synchronisation issues.
Additionally, repair is normally a read operation with very few writes, so it is still pretty fast as it is right now
Previously dismissed at #222, but it is nice to be able to rescue archives quickly during disaster recovery, especially when blkar needs to scan multi TB worth of data. Sort mode is expected to be the next step of DR process, so would be good to upgrade as well.
Things to be upgraded
DataBlockBuffer
as sort mode needs to keep track of the number of blocks that were in different ordersreader.cur_pos()
is really slow, swapped to calculating current reader position usingseek_to
andbytes_processed
to remove the bottleneck. Hence the further increase in performance gain, making it on par with rescue mode.Things to not be upgraded