Open vinipsmaker opened 3 years ago
So, my idea to use algorithms in the likes of the ones we've developed for the partial
namespace is parsers combinators. When I reach a token for which I want to partial::skip()
over, I'll just instantiate a new json::chunk_reader
to consume the current element. Once I'm over (parser for inner sequence emits token::end
), I'll call chunk_reader.next<token::null>()
on the outer parser. This idea might help to give you new scenarios to think on while you decide on what to do about #53.
Getting json::partial::skip()
to support chunk_reader
can be done by moving the implementation of skip()
into a detail
namespace that takes a generic typename Reader
argument instead of basic_reader<CharT>
, and then add two public skip()
overloads for chunk_reader
.
That only partially solves the problem though, because we need to distinguish between a error and end-of-input in order to indicate that the chunk_reader
needs more data (e.g. by return an empty view from skip()
.) However, chunk_reader
restores its internal state, so we cannot query the reader afterwards.
A possible workaround could be to return a tri-bool instead of a bool, which indicates
we need to distinguish between a error and end-of-input in order to indicate that the
chunk_reader
needs more data (e.g. by return an empty view fromskip()
.)
That kinda was my point about coroutines. Once skip()
returns, a new call to skip()
won't know how much more tail needs to be consumed. I don't like the coroutine approaches tho (C++ coroutines are the worse implementation of coroutines I've ever seen).
My idea was to keep the state in a new chunk_reader
object. You instantiate a new chunk_reader
to consume the subtree defined by the current token. That way you know you must consume (the inner) chunk_reader
until the end
token. So this inner chunk_reader
accidentally already keeps all the state you need about the substream. Once the subtree is parsed, you just need to advance the state for the outer chunk_reader
(we could use next("null")
but I'd like to see a next<token::null>()
instead).
Honestly it's pretty easy to reimplement skip()
and that's what I've done in my working copy of tabjson to implement chunked parsing. However there are more elaborate algorithms such as partial::parse()
that I prefer to reuse instead of reimplement (the "extra" typing to reuse a ready abstraction pays the price here). I don't plan to touch partial::parse()
anytime soon tho as tabjson serves all my needs so far w/o touching any DOM handle.
A possible workaround could be to return a tri-bool instead of a bool, which indicates [...]
That's the first time tribool ever solved a problem for me. It works for me.
The current lack of error vs end of input is also the only blocker for tabjson's chunked parsing as I can workaround everything else already.
I have added a feature branch with the tribool changes.
So, I'm trying to update tabjson to support the concatenated JSON streaming model. Therefore the first change I had to do was replacing
reader
bychunk_reader
. However as soon as I made the change the project failed to compile due to algorithms such asjson::partial::skip()
no longer working.These algorithms hardcode the type
json::reader
. However I don't think the current decision is wrong. It's pretty much unclear how to recover once a composite operation fails midway. Failing to acceptchunk_reader
args is a good thing here. If we were to change the algorithms I envision two approaches:Both approaches are undesirable in my eyes.
However the question remains: how would one approach the usage of algorithms such as
partial::skip()
in a chunked stream? I have a few ideas and I thought I'd open up an issue to discuss them. I'll comment more later (a few of the ideas will also be discussed in separate issues).