bgnerdclub / birb

6 stars 0 forks source link

Using NBT (.DAT) for birb_registry persistence #9

Open camsoftworks2018 opened 6 months ago

camsoftworks2018 commented 6 months ago

Current State

The WIP Registry module currently stores all persisted data as JSON.

Proposition

I recently discovered NBT as a viable format for compressing large amounts of data, There also exists a serde crate that can quickly serialize and deserialize NBT data.

Please Comment with suggestions/requirements. Priority low as birb_registry already has functional persistence.

jw2476 commented 6 months ago

I agree that we should store data in a different format than JSON, for speed, compression and also to add some level of obfuscation to save data.

Not sure about NBT personally but I've had a good experience with postcard @Alex-Programs has also used a binary serde codec but I can't recall what it was.

mavic7 commented 6 months ago

See https://github.com/djkoloski/rust_serialization_benchmark - I was reading last night. Offers a rich variety of options with performance comparisons for many different types of data (e.g. logs, meshes)

Alex-Programs commented 6 months ago

I would personally recommend Borsh having evaluated these for Hypertunnel ~6 months ago. It's fast-enough, easy to use, and has a pretty compact output. It isn't self-describing though, so if you want that I'd recommend messagepack.

If you want something manually editable I'd recommend JSON or TOML.

jw2476 commented 6 months ago

I'd argue not being self-describing is an advantage in the context of save files, borsh looks good, I'll take a look a the benchmarks later and we can find something that works well. I'd strongly suggest TOML for human-readable data, however that will be for config files which I think should be a separate module to the general registry.

Alex-Programs commented 6 months ago

How much data are we thinking of saving, and how large a time budget do we have?

All data transiting through Hypertunnel (~10mb/s atm) goes through borsh and it's not even close to being a bottleneck. It handles about 500-1000mbps with minimal overhead on a Ryzen 5 1600 single-threaded.

image

That said, there are certainly faster ones. rkyv in particular was only discarded due to a weird bug that may not be an issue here. I was out of patience with libraries at that point and didn't do much debugging - rykv could be a good choice.

Here are my notes from the time. They should help narrow things down. Ignore my comments about zero-copy - it's a pain to work with and we aren't working with large enough volumes for it to matter beyond just looking at the existing aggregate perf figures. It's not a reason to use one, though we shouldn't reject based on it either.

The top ones at logs are:
Abomonation, rkyv, speedy, savefile, nanoserde, borsh, bitcode, postcard, bincode

The top ones at mesh are:
Postcard, speedy, nanoserde, rkyv, abomonation, savefile, parity-scale-codec, bincode

The top ones at minecraft are:
abomonation, rkyv, speedy, savefile, bitcode, bincode, borsh, postcard, nanoserde

The top ones at mk48 are:
abomonation, speedy, rkyv, savefile, bitcode, psotcard, parity-scale-codec, borsh, nanoserde, bincode

I have decided to investigate:
Abomonation, rkyv, speedy, postcard, bincode, borsh, nanoserde, savefile, bitcode

https://github.com/TimelyDataflow/abomonation is incredibly unsafe and so too vulnerable for our use.

https://crates.io/crates/rkyv/0.7.42 is zero-copy and according to https://github.com/djkoloski/rust_serialization_benchmark it has the best speeds of any of the zero-copy formats.

https://crates.io/crates/speedy/0.8.6 looks to be fast, comparable with rkyv, but not zero-copy. Zero-copy sounds useful for reducing memory overhead.

https://crates.io/crates/postcard/1.0.6 is a bit slower. It doesn't require the standard library but that's unnecessary for us. We couldn't use its heapless mode because we'd stack overflow.

https://crates.io/crates/bincode/1.3.3 seems basic but effective - but not zero-copy.

https://crates.io/crates/borsh/0.10.3 looks reasonable and has a security focus. Copies.

https://crates.io/crates/nanoserde/0.1.33 is weird

Msgpacker is fast but high overhead, past use

https://crates.io/crates/savefile/0.16.0
> You may ask what savefile brings to the table that serde doesn't already do better. The answer is: Not that much! Savefile is less capable, and not as well tested. It does have versioning support built-in as a first class feature.
> 
> Savefile is not yet a very widely used project. However, although there may be bugs, the intention is that the quality should be enough for production.

off putting - skip.

https://crates.io/crates/bitcode/0.4.0 does compression. It has some nice stuff around enum sizes but I don't think it's worth it. It changes between versions.

I intend to use https://crates.io/crates/rkyv/0.7.42 to start with. I'll try other ones later if it makes sense.

I started writing with `rkyv` and discovered that it was giving me weird 16-bit aligned vectors. I decided that for simplicity purposes I would use borsh, speedy, or bitcode. Speedy isn't very widely used, so bitcode or borsh look best. I decided on `bincode` because it is far more widely used. However, after trying to use it I discovered all of the docs were out of date - it seems it has gone unmaintained and been forked into https://crates.io/crates/bincode2 . That makes borsh more widely used, so I'm going to use that.
camsoftworks2018 commented 6 months ago

Will be looked into further after Performance Module Is finished.

For now JSON will be used unless we see unexpected outcomes, Registries will have to be saved by their individual modules until I get a rolling saving feature (Like how @jw2476 suggested)

The source code will probably be copied into a configstore module that reads from .conf/.ini files for specific modules, Though this will tie into a lot of modules so ill create a separate issue with attached PR.

Alex-Programs commented 6 months ago

Yeah, I don't see any reason to rush an alternative system. We're better off getting the core of the engine in first. Serde serialisers are pretty trivial to swap in and out.