USCiLab / cereal

A C++11 library for serialization
BSD 3-Clause "New" or "Revised" License
4.24k stars 764 forks source link

Streaming XML and JSON support #89

Open tarqd opened 10 years ago

tarqd commented 10 years ago

Since you're already thinking about switching parsers, it might be useful to use a SAX-style streaming parser.

Reasoning:

The JSON spec at least is very light and it's fairly easy to implement a parser if a streaming JSON parser does not already exist.

AzothAmmo commented 10 years ago

Relates #55 (streaming support), #82 (json parser), and #83 (xml parser).

AzothAmmo commented 10 years ago

One question that popped into my head regarding streaming parsing is how the sizes of variable sized containers will be handled. Since the current XML and JSON archives have everything in memory, they can query the size (number of nodes) of a container before doing a load. If we move to a streaming environment this will not be possible.

In the binary archives we explicitly serialize the size of variable sized containers. This might be unavoidable for streaming support. The XML/JSON become slightly less "pretty", but we keep the efficiency of a single memory allocation.

Thoughts on this?

DrAWolf commented 10 years ago

I cannot see a use-case for this. Say someone is receiving objects (in JSON/XML) over network. Why can't he wait until a message (an object) has arrived completely? After that he would un-serialize that (part of the whole) stream into his object(s). There would be some factory involved anyway, because different messages/objects can arrive.

tarqd commented 10 years ago

It's useful if you're streaming asynchronously and can assign a callback for when a message has been received and parsed completely. I don't think there's be many more allocations as you'd just allocate the object you're planning to unserialize anyway and for variable length containers use the insert/push_back methods and let the container handle it, not any different than if you had parsed it completely (lists do one allocation per item anyway, vectors scale reasonably well regardless)

tarqd commented 10 years ago

This might be stepping out of the goals of cereal though. It seems other libraries already handle this well like Capn' Proto so it might not be worth the complication (although for some people being able to unserialize a large buffer without requiring it to fit entirely in memory would be useful)

AzothAmmo commented 10 years ago

I don't have a lot of experience in this area so I'd prefer to get as much input as possible as to what is desirable. Given that we have full type information, it is likely possible for us to wait to receive entire containers before allowing serialization to continue. The goal here is for the user to interface with everything in a near identical fashion - I don't want callbacks of any type.

A more desirable interface is one that by default blocks until serialization is done (current behavior) but offers some way of starting and stopping the serialization until the data has fully poured in.

I think we could probably get some good feedback from @breese who originally suggested this feature (see #55).

tarqd commented 10 years ago

I'll be honest I mostly wanted this for RPC but that seems like it's outside of cereals domain, and bigger archives probably should be using a more specialized method. I agree keeping it simple it desirable.

DrAWolf commented 10 years ago

I support the basic idea of reading a stream as it arrives. Makes sense for large data as well as for network streaming.

The rest is really a fundamental decision what the lib's purpose(s) shall be. The field of serialization is pretty wide.

AzothAmmo commented 10 years ago

Potentially pretty cool: this XML API for C++ shown at boostcon. Source available here.

DrAWolf commented 10 years ago

Looks well thought and designed, but the hoops he is jumping through with attributes vs. nesting is just another example of the failure of XML...