Reinvestigate multithreading rescue and sort mode

Previously dismissed at #222, but it is nice to be able to rescue archives quickly during disaster recovery, especially when blkar needs to scan multi TB worth of data. Sort mode is expected to be the next step of DR process, so would be good to upgrade as well.

Things to be upgraded

[x] rescue
- The collected blocks can be written in batch and grouped by UIDs to reduce opening and closing of file writers
- The stats should only be updated by writer thread after writing is completed. This is to ensure logged progress in the log file is accurate.
- Upgrade completed, roughly 4x the performance (300% increase), going from ~50MB/s to ~200MB/s on my laptop
[x] sort
- May need to add a read position field to DataBlockBuffer as sort mode needs to keep track of the number of blocks that were in different orders
- Upgrade completed, roughly ~1.5x~ 2.75x the performance (~50%~ 175% increase), going from ~80MB/s to ~\~120MB/s~ ~220MB/s on my laptop
- Turns out using reader.cur_pos() is really slow, swapped to calculating current reader position using seek_to and bytes_processed to remove the bottleneck. Hence the further increase in performance gain, making it on par with rescue mode.

Things to not be upgraded

repair
- To do in-place edits of a file, it means the a single reader is used with write enabled, but this means reader needs to be locked for both reading and writing operation, negating the benefits from pipelining via actor model. In other modes, the reader and writer point to different files, so there are no synchronisation issues.
- Additionally, repair is normally a read operation with very few writes, so it is still pretty fast as it is right now

darrenldl / blockyarchive

Reinvestigate multithreading rescue and sort mode #243