Closed ben-e-whitney closed 2 years ago
@gqian-coder, it won't let me assign you, for some reason.
@ben-e-whitney Thanks for summarizing all these libraries. Although I'm not very familiar with using Protocol Buffers, I think it is a good choice for us since it is well maintained by Google and I know a lot of libraries are using it.
Background
For the past couple of weeks I have been working on the MGARD file format. My original plan was something like this:
uint32_t
s,double
s, etc.). Figure out how to handle endianness and type representations that might differ from one computer to the next.After reading a bunch about how people handle serialization, I think we should not try to handle this ourselves. Brief argument:
This other approach:
I think this will work well for us. We just have to pick which library to use.
Libraries
Here are some options I've found. As a very rough measure of popularity, I've listed how many stars on GitHub each project has (if it has a repository there). Let me know what looks good to you. I vote for Protocol Buffers. I'll start learning it while I wait to see if anyone objects.
cereal
I played around with library and liked it. However, we probably can't use it because of this:
2,900 stars on GitHub.
Protocol Buffers
To use, write
.proto
files and compile them to C++ reading/writing classes. Other languages supported, too. It appears we can read and write message-by-message, which is good. Lots of thought put into compatibility. Not intended for serialization of arbitrary classes (defines the representable data structures (approximately PODs, including enums)), which is probably a good thing. 51,200 stars on GitHub.Boost Serialization
Allows for value-by-value (de)serialization and versioning for classes. I imagine it's pretty good since, it's in Boost, but it's a big dependency to add. 83 stars on GitHub (probably means nothing since Boost is split into a lot of small repositories).
s11n
Development seems to be on hold. Quite possibly the released versions would serve our needs, but I suppose we might as well use something getting maintained.
FlatBuffers
Seems to emphasize memory efficiency (no separate parsing step). Possibly not a great fit for the incremental approach we might want, but I don't know. 16,900 stars on GitHub.
MessagePack
Seems to be slightly low level compared to other options. Have to manually pack and unpack structs. 5,800 stars on GitHub.
Apache Thrift
Uses the same approach: you write out a schema and compile to get a library you can call. Seems geared towards web development. 8,700 stars on GitHub.
Apache Avro
Seems to be focused on embedding message formats and, like Thrift, web development (RPC). 2,000 stars on GitHub.
Cap'n Proto
Like FlatBuffers, the structures are read directly from memory (no separate parsing step). The author was also involved in writing Protocal Buffers. Comparison with Protobuf, Simple Binary Encoding, and FlatBuffers. 8,500 stars on GitHub.