dvidelabs / flatcc

FlatBuffers Compiler and Library in C for C
Apache License 2.0
646 stars 182 forks source link

Question about json parsing: unknown fields, top-level arrays #259

Closed dontlaugh closed 1 year ago

dontlaugh commented 1 year ago

I'm new to flatbuffers, and I'd like to confirm my understanding.

  1. Does the generated json parser rejects any object with unknown fields?

I'm asking about this based on trying to parse a tiny subset of a large JSON object returned from an HTTP api, and the generated parser returns the "unknown symbol" error.

I got past number 1 by using json-c to do the initial parsing then manually build a flatbuffer.

#undef ns
#define ns(x) FLATBUFFERS_WRAP_NAMESPACE(Buildkite_Build, x)

#define J_GET_STR(jobj, key) json_object_get_string(json_object_object_get(jobj, key))
#define J_GET_BOOL(jobj, key) json_object_get_boolean(json_object_object_get(jobj, key))
#define J_GET_INT(jobj, key) json_object_get_int(json_object_object_get(jobj, key))

void build_Buildkite_Build_fb(flatcc_builder_t *B, json_object *jobj) {
    // table Build {
    //   web_url:string;
    //   number:short;
    //   state:string;
    //   blocked:bool;
    //   message:string;
    // }

    ns(start_as_root(B));
    const char *web_url = J_GET_STR(jobj, "web_url"); 
    ns(web_url_create_str(B, web_url));
    const char *state = J_GET_STR(jobj, "state"); 
    ns(state_create_str(B, state));
    const char *message = J_GET_STR(jobj, "message"); 
    ns(message_create_str(B, message));
    _Bool blocked = J_GET_BOOL(jobj, "blocked"); 
    ns(blocked_add(B, blocked));
    int number = J_GET_INT(jobj, "number"); 
    ns(number_add(B, number));
    ns(end_as_root(B));
}

Is there any way to have the parser ignore fields that aren't defined in the flatbuffer schema?

mikkelfj commented 1 year ago

Is there any way to have the parser ignore fields that aren't defined in the flatbuffer schema?

Quick answer: yes there is, see JSON parser header file comments and possibly JSON test code. There are runtime flags to ignore unknown fields.

Ask again if you want more detail or guidance.

dontlaugh commented 1 year ago

I have been reading through the headers and I see these flags. I will try to make this work.


I have another question. The API response I'm interested in returns a top-level array. So I would need to parse an array of the following, as the root.

table Build {
  web_url:string;
  number:short;
  state:string;
  blocked:bool;
  message:string;
}

Is this possible in flatbuffers? I couldn't find a syntax for it.

mikkelfj commented 1 year ago

So I would need to parse an array of the following, as the root.

No, this is also something I would sometimes want. Notably if a CSV parser were to be implemented. Flatbuffers cannot have top-level arrays. There are two options:

  1. To stack separate buffers using the length prefix option (note that specifically in C, you may have to patch this up to a multiple of aligned size if the a single buffer does not end at an offset suitable for the next buffer to start. This is not hard but must be done. (This ought to be fixed, C++ always pads up the buffer to an aligned size).
  2. Use a parent buffer with an array.

In either case you cannot parse directly. I could suggest many options, but if you go with stacked buffers, the simplest solution is to write your own array parser and immediately call the Flatbuffer parser on each element start with '{'. You need to extract the end marker of each parse so you can continue skipping the separating comma, but this should be easy to access.

Each table also comes with a JSON array parser of said table (because the parser is made up of many small parsers), but it is not usually called directly so you would have to first manually start a buffer with a parent table it can parse into, then setup the parsing context to match. This is definitely possible, but not a beginner topic.

You could also copy the input array into a buffer with the parent table json added, then parse everthing as the parent table in one go. This is definitely the easiest option.

mikkelfj commented 1 year ago

Oh, there is also the following option: You create your own array parser, call each parser individually, but instead of stacking, you manually build the parent table and add each parsed flatbuffer into the new buffer as an array element. There is a way to efficiently copy elements from one buffer into another so it appears as a local data type and not just a binary blob. But again, not entirely beginner level. A sort of in-between is to create an index of the stack buffers in a separate buffer.

dontlaugh commented 1 year ago

Thank you for your responses. I'm comfortable with using json-c to parse the top-level array cURL returns, then creating flatbuffers in a loop. I'm not concerned with maximum efficiency right now, rather simply using flatbuffers as a consistent serialization method.

We can close this issue, if you like.

mikkelfj commented 1 year ago

Performance should (almost) not be an issue using this method, it's just about how you want the final data represented. If your parser needs to skip over all the content of array elements, you can improve by hand coding a simple parser to skip a comma and detect end of array.