Thoughts on a dual-file implementation?

christianbundy commented 4 years ago

Right now flumelog-offset has one file which contains both the data and the indexes in this format:

<data.length (UInt32BE)>
<data ...>
<data.length (UInt32BE)>
<file_length (UInt32BE or Uint48BE or Uint53BE)>

This means that the data length is encoded twice and the sequence numbers are byte offsets, which means we can have invalid offsets like https://github.com/flumedb/flumedb/issues/32.

Would it be better to use two files here instead?

Index:

<data.offset (UInt32BE)>
<data.length (UInt32BE)>
<file.length (UInt32BE or Uint48BE or Uint53BE)>

Data:

<data ...>

We'd have to use two file descriptors, but it would give us the ability to have consecutive sequence numbers. For example, to get the 42nd message you'd read 8 bytes from the index file at offset 3 * 4 * 42, which would give you the offset and length of the data you want to read from the data file.

I have a hunch that this would be "better", but I'm not sure. Any chance it's been implemented, considered, or discussed before? I've been thinking about this since I saw the approach mentioned here.

dominictarr commented 4 years ago

Yes. this is actually how the go implementation works! @keks @cryptix sequential sequence numbers mean it works much better with Roaring Bitfields. Also, makes it possible to do binary search! (although only on receive time, still it's a cool feature though) I've been meaning to implement this since hanging out with @keks last year and learning this but other stuff happened...

small bikeshed: it might be better for the sequence file to be

<offset int64>...

and then the data file to be

<varint length><data...>

Because then the datafile has enough information to rebuild the sequence file. that will be useful for crash recovery, because you won't need to worry about the case where the sequence file gets written but not the data file and vice versa. Not sure if this is how go does it though.

dominictarr commented 4 years ago

having binary search means secure-scuttlebutt could drop the time index https://github.com/ssbc/ssb-db/blob/master/extras.js#L9-L11 but retain backwards compatibility

dominictarr commented 4 years ago

also: oh, yeah if you use https://github.com/random-access-storage/ to do it, it will work in the browser (both chrome and ff, without extra work) flumeview-aligned-offset uses https://github.com/dominictarr/polyraf so the right adapter is applied automatically

flumedb / flumelog-offset

Thoughts on a dual-file implementation? #21