Open tarqd opened 10 years ago
Relates #55 (streaming support), #82 (json parser), and #83 (xml parser).
One question that popped into my head regarding streaming parsing is how the sizes of variable sized containers will be handled. Since the current XML and JSON archives have everything in memory, they can query the size (number of nodes) of a container before doing a load. If we move to a streaming environment this will not be possible.
In the binary archives we explicitly serialize the size of variable sized containers. This might be unavoidable for streaming support. The XML/JSON become slightly less "pretty", but we keep the efficiency of a single memory allocation.
Thoughts on this?
I cannot see a use-case for this. Say someone is receiving objects (in JSON/XML) over network. Why can't he wait until a message (an object) has arrived completely? After that he would un-serialize that (part of the whole) stream into his object(s). There would be some factory involved anyway, because different messages/objects can arrive.
It's useful if you're streaming asynchronously and can assign a callback for when a message has been received and parsed completely. I don't think there's be many more allocations as you'd just allocate the object you're planning to unserialize anyway and for variable length containers use the insert/push_back methods and let the container handle it, not any different than if you had parsed it completely (lists do one allocation per item anyway, vectors scale reasonably well regardless)
This might be stepping out of the goals of cereal though. It seems other libraries already handle this well like Capn' Proto so it might not be worth the complication (although for some people being able to unserialize a large buffer without requiring it to fit entirely in memory would be useful)
I don't have a lot of experience in this area so I'd prefer to get as much input as possible as to what is desirable. Given that we have full type information, it is likely possible for us to wait to receive entire containers before allowing serialization to continue. The goal here is for the user to interface with everything in a near identical fashion - I don't want callbacks of any type.
A more desirable interface is one that by default blocks until serialization is done (current behavior) but offers some way of starting and stopping the serialization until the data has fully poured in.
I think we could probably get some good feedback from @breese who originally suggested this feature (see #55).
I'll be honest I mostly wanted this for RPC but that seems like it's outside of cereals domain, and bigger archives probably should be using a more specialized method. I agree keeping it simple it desirable.
I support the basic idea of reading a stream as it arrives. Makes sense for large data as well as for network streaming.
The rest is really a fundamental decision what the lib's purpose(s) shall be. The field of serialization is pretty wide.
Potentially pretty cool: this XML API for C++ shown at boostcon. Source available here.
Looks well thought and designed, but the hoops he is jumping through with attributes vs. nesting is just another example of the failure of XML...
Since you're already thinking about switching parsers, it might be useful to use a SAX-style streaming parser.
Reasoning:
any
-type that is common in JSON parsers. They make heavy use of virtual calls which are unnecessary (for exampleon_number=void(double num)
instead ofif (variable.isDouble()) x = variable.getDouble()
The JSON spec at least is very light and it's fairly easy to implement a parser if a streaming JSON parser does not already exist.