Open knutwannheden opened 7 months ago
Getting very close to ID 100,000 here 😄
Tagging @bartonjs here which I saw commenting on the only other CBOR issue I could find.
Reading from a stream would suggest that we'd also want Async versions of all of the reader methods, because there's no guarantee that the next element can finish without blocking; so it's a somewhat expensive proposal.
How big of CBOR documents are you working with? When we were spinning up the project to make the reader everything was just a few kilobytes.
Could be megabytes. I was however also thinking that streaming would be nice because it would let a reader start processing the tokens even before all data has been transferred by the client.
this nodejs package supports streaming https://github.com/kriszyp/cbor-x?tab=readme-ov-file#streams. if format spec doesn't restrict block reading all data for encoding or decoding, can dotnet implementation change to process data progressively too?
@knutwannheden Is this proposal just for stream reading helpers that accept a stream and return a CborReader? Or is it for an incremental reader api that would allow interpreting partially downloaded data?
@AlgorithmsAreCool If I understand your question correctly, it is the latter. So the CborReader
would read bytes from the stream (as necessary) whenever a method is called on the reader to return the next token.
In that case, I would also be interested in a CborReader
that had the incremental API much like JsonDocument
that allowed us to read from very large CBOR documents and possibly CBOR Sequences. CBOR is basically binary JSON and we already accomodate massive JSON documents, so I think this is natrual.
But it is a big feature and new API surface
Further, allowing the CborWriter
to also write directly to a Stream
would also feel like a sensible addition. For my use case I have no need for async methods, as I don't have any desire to "async all the way up" my code base.
I was quite confused seeing that there are no methods for operating with Stream
instances, both for reading and writing. Are we supposed to process data in one big array like it's the C age again?
The current "buffer" approach is fine, but there must be away to update the buffer without resetting the whole state. This could be quite a novel approach to semi-async
data processing (synchronous parsing, asynchronous advancing), but without it, CborReader
and CborWriter
are pretty much unusable.
Background and motivation
When reading very large documents or documents that are read via a slow network connection, it seems very limiting that the whole document must be read into contiguous memory before it can be parsed. That can consume a lot of memory and doesn't allow an application to implement streaming in a reasonable way.
API Proposal
No concrete proposal. It should just allow data to be read from a stream.
API Usage
None.
Alternative Designs
No response
Risks
No response