boostorg / json

A C++11 library for parsing and serializing JSON to and from a DOM container in memory.
https://boost.org/libs/json
Boost Software License 1.0
429 stars 93 forks source link

Stream parser? #1039

Open sabudilovskiy opened 1 week ago

sabudilovskiy commented 1 week ago

Are there any plans to add parse_one_token methods to the parser? This would greatly simplify the writing of good domeless parsers, since processing would become much easier. The current api not only does not allow this to be done legally, but also illegally - it is simply impossible, since the parser behaves very strangely to receive false from the handler to process the end of the array/object.

grisumbras commented 1 week ago

We have a stream_parser class, but it's not what you are talking about. It parses a JSON document until it runs out of input, then suspends.

We also have an API to handle JSON parsing events. The keystone of it is the class template basic_parser. You can read an explanation of its usage in this section.

We also support parsing without constructing DOM containers. You can read about it here.

So, what specifically are you requesting? The ability to suspend parsing on DOM events? If so, what are you trying to achieve?

sabudilovskiy commented 1 week ago

I have seen this api, I am aware that it exists. But if you use it to parse complex types (with nesting and so on), then in this handler you have to assemble a stack of parsers in one way or another. It even seems boost::parse_into works the same way, I've seen that parsers in types keep pointers on parents (and on members?). I'm not sure. I have an idea to do this explicitly if the parser will be able to receive exactly 1 token. Then, for example, when parsing the aggregate, I can go into recursion, and the parse stack will assemble on its own.

The ability to suspend parsing on DOM events? If so, what are you trying to achieve?

Yep, i want read_some_event in parser. The usage should have been something like this (in fact, a wrapper over basic_parser is already used here, it would be enough for me to be able to read exactly 1 token, I will do the rest myself).

void parse(stream_parser& parser) {
    using wait = wait_handler::wait_e;
    wait_handler& handler = parser.p.handler();

    parser.consume(wait::object_begin);

    while (true){
      parser.consume(wait::key | wait::object_end);
      if (handler.got == wait::object_end){
        break;
      }
      cur_ = count_index(handler.key_m);

      if (cur_ == empty){
          BOOST_JSON_FROM_ERROR;        
      }
      else if (cur_ == unknown){
        ingore_parser{}.parse(parser); 
      }
      else {
        if (parsed_[cur_]){
            BOOST_JSON_FROM_ERROR;
        }
        visit_index<N - 1>([&]<std::size_t I>(){
          using Field = boost::pfr::tuple_element_t<I, T>;
          auto& field = boost::pfr::get<I>(t_);
          boost_domless_parser<Field>{field}.parse(parser);
        }, cur_);
          parsed_[cur_] = true;
          cur_ = empty;
        }      
    }
    if (!all_parsed()){
      BOOST_JSON_FROM_ERROR;
    }
  }
vinniefalco commented 1 week ago

That is a useful feature but it is something completely separate from what Boost.JSON offers. It could be a different set of algorithms.

grisumbras commented 1 week ago

So, you want parsing directly into a user defined type, but parse_into doesn't work for you. Can you elaborate why?

sabudilovskiy commented 6 days ago

So, you want parsing directly into a user defined type, but parse_into doesn't work for you. Can you elaborate why?

I didn't say that parse_into didn't work -I just wasn't going to use it. I haven't measured it yet, but maybe parse_into will be good enough (at the moment I don't have an implementation for it). I would just like to have an additional api with parsing of one token, the code that I saw in the parser seems to change the state very often at once, which is why actually there may have to change quite a lot for this. Attempts to illegally achieve the desired result failed, so I decided to write here. It seems to me that the approach with parsing exactly one token is much more convenient for custom domeless parsers than a global handler (because, as I said, you can build recursion for nested types without an explicit stack, only on the call stack).