That3Percent / tree-buf

An experimental serialization system written in Rust
MIT License
262 stars 8 forks source link

the trait `tree_buf::internal::encoder_decoder::Encodable` is not implemented for `i32` #18

Closed Timmmm closed 4 years ago

Timmmm commented 4 years ago

I get this when trying to encode a Vec<i32>. Not sure if you're aware of this? Just checking in case you aren't!

That3Percent commented 4 years ago

Thanks for the report! This is a duplicate of #16 and will be worked on soon. At the moment I've been sidelined working on firestorm because the flamegraph profiler that was integrated into Tree-Buf before had too much overhead to produce useful profiling information. I should get to this issue as soon as that integration is finished.

I'm so glad that you tried out Tree-Buf. I would love it if you could tell me what you're working on, why you think Tree-Buf might be a good fit, and how I can make sure that it has what you need.

Timmmm commented 4 years ago

Ah sorry, not sure how I missed that! I can't really share the files because it's for a work thing, but it's basically debug info / logs for compilation. The compilation process produces a lots of information about memory, variable names, etc. and currently this is written to a JSON file, which can get ridiculously big (biggest I've seen is 35 GB).

We probably aren't going to use TreeBuf given it's alpha status and because we need streaming writes, but I wanted to know how efficient its general approach is, and the answer turns out to be pretty good. Here's a comparison I did for a small output:

file size
log.json 41 MB
log.json.lz4 12 MB
log.cbor 24 MB
log.cbor.lz4 5.5 MB
log.treebuf 15 MB
log.treebuf.lz4 3.2 MB

I LZ4'd everything to see how much redundancy was left. I think it still compresses so well because we have a string table with a load of similar strings (/Foo/bar/baz, /Foo/bar/qux, etc) in it.

Timmmm commented 4 years ago

Oh and the other reason JSON/CBOR are rubbish is because they don't support sparse reads so we have to load the entire 35 GB into memory. :-/

That3Percent commented 4 years ago

Thanks for sharing. That's all super useful information.

I can see that to support your use-case we would need: streaming writes, sparse reads, and i32 at least. These are all on the roadmap, but it's good to re-affirm these priorities. Unfortunately, I doubt that the streaming write capability will be ready in time for your needs so I think you're making the right decision not to use Tree-Buf yet.

Huh, I didn't really realize this before but there might not be any great options that offer both streaming write and sparse reads? ProtoBuf is good for streaming write, Flatbuffers is good for sparse reads... but I don't have much of a recommendation for both. The current plan for the final design of Tree-Buf is to be able to selectively load a subset schema very efficiently (can already do this), and be able to skip to data within a list in a not exceedingly efficient but not terrible way (it can't do this yet).

If you wanted to continue to play around, you can see a breakdown of the size of the file by running this on the bytes:

let sizes = tree_buf::experimental::stats::size_breakdown(&tb_bytes);
println!("{}", sizes.unwrap());
Timmmm commented 4 years ago

there might not be any great options that offer both streaming write and sparse reads

Yeah that is the conclusion I am coming to! The only one I've found is SQLite, but I'm not sure how fast and space efficient it will be and I feel like it's overkill to use an actual database, though it does mean you can get stuff like indexes and foreign keys for free.