Use something else for the log?

Hoverbear / old-raft-rs

[Incomplete] A Raft implementation in Rust

https://hoverbear.github.io/raft-rs/raft/

MIT License

266 stars 41 forks source link

Use something else for the log? #4

Closed Hoverbear closed 9 years ago

Hoverbear commented 9 years ago

This by @TyOverby would probably be useful instead of doing JSON -> Base64.

It might be worthwhile to look into the possibility of bincoding a Vec<(u64, T)> and somehow syncing it to disk instead of doing it line by line. One downside of this is that we'd need to keep the entire log in memory, and that sucks!

TyOverby commented 9 years ago

Is there a reason that you want the data to be line-separated? If you treat this file as just a binary blob, you could skip the Base64 step and get good performance and compression (if you want compression).

Hoverbear commented 9 years ago

@TyOverby There is no particular reason! One concern with the approach of a blob is you'd have to store the entire thing in memory to work with it.

TyOverby commented 9 years ago

If the entries that are being stored are constant in size, then you could seek to the entry_size * i byte in the file and then read from there.

If that isn't an option, you could have two files. One with a binary blob, one with a hashmap of EntryNumber to Position, where position is a byte offset into the file.

With either of these, reads and writes will be absurdly fast, as you don't need to scan through the lines, and you just deserialize the entries that you want.

Hoverbear commented 9 years ago

@TyOverby Hmm, I don't think I can make a constant size assumption, I'm trying to keep the library as generic as possible!

The hashmap idea is really clever, I like it! I guess you could presumably store the hashmap in memory even since it would be a constant (u64, u64), or at least a good hunk of it. Since most updates to the hashmap would be fairly small, is there a good way to "sync" the file?

TyOverby commented 9 years ago

Yeah, you would want to leave the "hashmap" in memory. On disk, it would probably be a list of (u64, u64). Because (u64, u64) is constant size, you can use direct-indexing into that if you wanted to do an update, and appending to it would be as easy as appending to the file.

I'm not sure that this is the most elegant way to do things. A database is probably much more suited to the task.

zslayton commented 9 years ago

@Hoverbear, have you looked at using a key/value storage engine like LevelDB, LMDB or WiredTiger? I'm unsure of whether those have Rust bindings in crates.io yet, but if they do I think they'd suit your use case. They're all built on more or less the same core API that lets you:

Create / reopen a file
Store a value of (mostly) arbitrary length for a key
Retrieve the value for a given key
Retrieve a collection of values for a given range of keys (a fast operation as things are stored ordered by keys)

In your case, the keys would be line numbers and the values would be the line contents. Everything would be written to a single file per log with no requirement to maintain an in-memory data structure -- you could allow the log to outgrow RAM.

Hoverbear commented 9 years ago

@TyOverby Yes I think that might be the right solution, I just need to pick something simple and embedded since I have fairly limited requirements.

@zslayton Yes this was discussed on Reddit!. It looks like there are LMDB bindings. Thank you!

I think that finding some sort of embedded database that is simple and cheap enough is the way to go.

viperscape commented 9 years ago

redis-rs might be worth a look too.

jakejscott commented 9 years ago

There's also these bindings to LMDB https://github.com/vhbit/lmdb-rs

Hoverbear commented 9 years ago

@danburkert had some awesome ideas here and contributed a patch.