ambarltd / emulator

A local version of Ambar
2 stars 0 forks source link

Create a file-backed partition #1

Closed lazamar closed 1 month ago

lazamar commented 1 month ago

Goals:

Concurrent writes are not supported.

The implementation involves 2 files, one index and a records file. The records file contains one record per line.

This allow for fast sequential consumption as we can just go through the file reading line by line.

The only write operation allowed on the records file is to append a new record. This allows for safe parallel reads and for reads to happen concurrently with writing.

To allow for fast seeks (jump to the nth entry) we use an index file. The index file is a binary encoded sequence of unsigned 64 bit integers. The nth entry in the index represents the byte offset of the nth entry in the records file. Like the records file, only append append write operations are allowed in the index file.

As a consequence of the 'readable in a text editor' requirement, the '\n' character is used as the record separator and is therefore not allowed in the record. That's why this structure is targets unformatted JSON records.

Benchmarks

Write 10K messages in 29ms. Read 10K messages in 1ms.

benchmarking queue/file partition/read 10000 messages
time                 1.105 ms   (1.101 ms .. 1.109 ms)
mean                 1.110 ms   (1.106 ms .. 1.114 ms)
std dev              14.80 μs   (12.00 μs .. 19.98 μs)

benchmarking queue/file partition/read 10000 messages 3x in parallel
time                 1.676 ms   (1.643 ms .. 1.705 ms)
mean                 1.649 ms   (1.628 ms .. 1.669 ms)
std dev              74.57 μs   (66.77 μs .. 90.51 μs)
variance introduced by outliers: 31% (moderately inflated)

benchmarking queue/file partition/write 10000 messages
time                 29.02 ms   (28.64 ms .. 29.39 ms)
mean                 29.01 ms   (28.81 ms .. 29.42 ms)
std dev              579.8 μs   (256.9 μs .. 978.9 μs)

benchmarking queue/file partition/write and read 10000 messages in series
time                 31.36 ms   (30.18 ms .. 33.40 ms)
                     0.995 R²   (0.989 R² .. 1.000 R²)
mean                 30.40 ms   (30.16 ms .. 31.05 ms)
std dev              812.4 μs   (290.1 μs .. 1.462 ms)

benchmarking queue/file partition/write and read 10000 messages in parallel
time                 43.18 ms   (39.63 ms .. 48.39 ms)
mean                 45.47 ms   (44.01 ms .. 47.66 ms)
std dev              3.814 ms   (2.360 ms .. 5.589 ms)
variance introduced by outliers: 27% (moderately inflated)