Reader algorithms on chunk_reader

vinipsmaker commented 3 years ago

So, I'm trying to update tabjson to support the concatenated JSON streaming model. Therefore the first change I had to do was replacing reader by chunk_reader. However as soon as I made the change the project failed to compile due to algorithms such as json::partial::skip() no longer working.

These algorithms hardcode the type json::reader. However I don't think the current decision is wrong. It's pretty much unclear how to recover once a composite operation fails midway. Failing to accept chunk_reader args is a good thing here. If we were to change the algorithms I envision two approaches:

Extend the algorithms to embed the reading logic through user-injection points (e.g. a closure called to read more once end-of-buffer is reached).
Use a coroutine so the algorithm can resume once the user fills more of the buffer.

Both approaches are undesirable in my eyes.

However the question remains: how would one approach the usage of algorithms such as partial::skip() in a chunked stream? I have a few ideas and I thought I'd open up an issue to discuss them. I'll comment more later (a few of the ideas will also be discussed in separate issues).

vinipsmaker commented 3 years ago

So, my idea to use algorithms in the likes of the ones we've developed for the partial namespace is parsers combinators. When I reach a token for which I want to partial::skip() over, I'll just instantiate a new json::chunk_reader to consume the current element. Once I'm over (parser for inner sequence emits token::end), I'll call chunk_reader.next<token::null>() on the outer parser. This idea might help to give you new scenarios to think on while you decide on what to do about #53.

breese commented 3 years ago

Getting json::partial::skip() to support chunk_reader can be done by moving the implementation of skip() into a detail namespace that takes a generic typename Reader argument instead of basic_reader<CharT>, and then add two public skip() overloads for chunk_reader.

That only partially solves the problem though, because we need to distinguish between a error and end-of-input in order to indicate that the chunk_reader needs more data (e.g. by return an empty view from skip().) However, chunk_reader restores its internal state, so we cannot query the reader afterwards.

A possible workaround could be to return a tri-bool instead of a bool, which indicates

true = token available
false = error
inderterminate = end of input

vinipsmaker commented 3 years ago

we need to distinguish between a error and end-of-input in order to indicate that the chunk_reader needs more data (e.g. by return an empty view from skip().)

That kinda was my point about coroutines. Once skip() returns, a new call to skip() won't know how much more tail needs to be consumed. I don't like the coroutine approaches tho (C++ coroutines are the worse implementation of coroutines I've ever seen).

My idea was to keep the state in a new chunk_reader object. You instantiate a new chunk_reader to consume the subtree defined by the current token. That way you know you must consume (the inner) chunk_reader until the end token. So this inner chunk_reader accidentally already keeps all the state you need about the substream. Once the subtree is parsed, you just need to advance the state for the outer chunk_reader (we could use next("null") but I'd like to see a next<token::null>() instead).

Honestly it's pretty easy to reimplement skip() and that's what I've done in my working copy of tabjson to implement chunked parsing. However there are more elaborate algorithms such as partial::parse() that I prefer to reuse instead of reimplement (the "extra" typing to reuse a ready abstraction pays the price here). I don't plan to touch partial::parse() anytime soon tho as tabjson serves all my needs so far w/o touching any DOM handle.

A possible workaround could be to return a tri-bool instead of a bool, which indicates [...]

That's the first time tribool ever solved a problem for me. It works for me.

The current lack of error vs end of input is also the only blocker for tabjson's chunked parsing as I can workaround everything else already.

breese commented 3 years ago

I have added a feature branch with the tribool changes.

breese / trial.protocol

Reader algorithms on chunk_reader #52