breese / trial.protocol

Network wire protocols
11 stars 4 forks source link

Tree parsing and formatting #21

Closed breese closed 6 years ago

breese commented 6 years ago

I have added a tentative version of tree parsing and formatting of JSON in the feature/tree branch. I would like to solicit feedback on this before it enters the develop branch.

The API is fairly straight-forward:

// Converts a JSON formatted input buffer into a dynamic::variable.
// The input buffer must be one of the supported buffer types (e.g. std::string, std::ostringstream, std::vector)
// <trial/protocol/json/parse.hpp>
template <typename BufferType>
dynamic::variable parse(BufferType input);

// Converts a dynamic::variable into a JSON formatted output buffer.
// The output buffer is created.
// <trial/protocol/json/format.hpp>
template <typename BufferType>
BufferType format(dynamic::variable);

// Converts a dynamic::variable into a JSON formatted output buffer.
// The output buffer is supplied by the user.
// <trial/protocol/json/format.hpp>
template <typename BufferType>
void format(dynamic::variable, BufferType&);
vinipsmaker commented 6 years ago

The input buffer must be one of the supported buffer types (e.g. std::string, std::ostringstream, std::vector)

Did you mean istringstream?

dynamic::variable parse(BufferType input);

I think you could add support for string_view too.

vinipsmaker commented 6 years ago

I like the two versions of the format function. The first is more convenient, but the second is useful if you want to reuse a vector to avoid new allocations in a loop.

breese commented 6 years ago

Whoops, std::ostringstream only applies to json::format().

vinipsmaker commented 6 years ago

I took a look at the Python version of format: https://docs.python.org/3/library/json.html#json.dumps

I don't think there is much inspiration we can have from it. The interface is too complex. Much of this extra behaviour is too specific and should be provided by the user going lower level instead polluting the convenient interface (at least in my opinion).

Javascript's JSON.stringify is similar. It has a replacer parameter which I believe it is not appropriate here.

Howevew, both of them have some parameter to specify whether format function should indent the formatted JSON string. I also believe this is a enough common operation to consider its functionality here.

vinipsmaker2 commented 6 years ago

Qt's solution also adds a "indent" option: http://doc.qt.io/qt-5/qjsondocument.html#toJson

Btw, this QJsonDocument is pretty similar in purpose to trial::dynamic in case you want to check the competition:

Do keep in mind that Qt folks avoid C++ templates a lot and they also avoid exceptions.

breese commented 6 years ago

Here is an alternative suggestion for indentation.

Indentation is for humans, so we could make it an additional step for it. We could add reformat() function that takes a JSON input buffer and returns an indented JSON output buffer.

There is already a pretty printer example that shows how this can be done.

vinipsmaker commented 6 years ago

When using the json::reader interface, I break the algorithm in several smaller functions, one to parse each complex subelement (I'm quite a "heavy user" of json::reader as I have a JSON-based DSL which is pretty similar to MongoDB query language and I match this JSON described value with other JSON values).

One useful algorithm to the library would be a skip_element which would:

I think at one point users might decide to change the algorithm to store these fields instead skipping over them, but as they are extra/dynamic fields, there isn't a proper mapping between the hard coded C++ object and the JSON value. In these cases, it'd be useful to have a:

// Only parse current nesting level/element
dynamic::variable parse_element(json::reader input);

I think the parse implementation could be minimal and done on top of this parse_element.

So, suppose I have in my code:

...

// "foo" key
ret.foo = reader.value<int>();

...

// extra keys
skip_element(reader)

...

But then my code needs to be updated to store user_verification which is a dynamic value. The only thing I'd need to change in my code is:

// extra keys
if (key == "user_verification")
  ret.user_verification = parse_element(reader);
else
  skip_element(reader);

As of this skip_element, I have a place in my code which would benefit from such function to properly support arrays (currently it'll only reject the document). Once I get my hands to remove this limitation, I'll send a PR to Trial.Protocol with skip_element algorithm.

vinipsmaker commented 6 years ago

Indentation is for humans, so we could make it an additional step for it. We could add reformat() function that takes a JSON input buffer and returns an indented JSON output buffer.

Unless both algorithms are provided, performance for one use case will be sacrificed. I don't know which use case is more common, so I'll refrain from opining here (my use case is for non indented JSON as it should be invisible to the user most of the time and saves network bandwidth).

I've made a Telegram poll in the Brazilian C++ group (over 600 members) and the result was:

I'm note sure the poll description was good enough to make users do a informed answer. I included the link to this discussion so I could lurk more feedback here (and I think this plan failed). I also asked in the Brazilian C++ mailing list and the only answer was in favour of the format with indent.

Maybe you should raise a topic in the Boost mailing list.

breese commented 6 years ago

Great idea about skip_element and parse_element. Can you open two separate issues for them? (there are some details with parse_element that we need to figure out first.)

breese commented 6 years ago

Regarding indentation, I suggest that we postpone the decision for now.

breese commented 6 years ago

Should we add a json::partial::format(json::writer) function as well?

vinipsmaker commented 6 years ago

Should we add a json::partial::format(json::writer) function as well?

I think it's useful, but so far I'm either generating JSON with fmtlib or converting from dynamic::variable to JSON using the oarchive support. I'm the not very demanding user in this side.

I'd have greater interest in this abstraction if I was developing some transformation tool.

vinipsmaker commented 6 years ago

Btw, this QJsonDocument is pretty similar in purpose to trial::dynamic [...]

I mentioned Qt previously in this thread. I just found a Facebook equivalent of dynamic::variable:

https://github.com/facebook/folly/blob/974e6be0f405b29b410401bb03d015c68f6c38f4/folly/dynamic.h

It also includes a DynamicParser which looks like a mix of a validator, a SAX API and a deserializer:

https://github.com/facebook/folly/blob/974e6be0f405b29b410401bb03d015c68f6c38f4/folly/experimental/DynamicParser.h

This DynamicParser is horrible and better alternatives can spur out of Trial.Protocol (like the scanf-like suggestion I've pointed previously). However, it's still interesting to take a look. It reminds me of the JsonCreator (which does just the opposite):

https://github.com/bcsanches/JsonCreator

breese commented 6 years ago

json::partial::format() has been added in 4719f9527a682177bf266d96b41fc09a430ce6df

breese commented 6 years ago

Yes, I am familiar with the folly dynamic variable, and I share your assessment.

vinipsmaker commented 6 years ago

json::partial::format() has been added in 4719f95

Does this means that only indent level is missing (design and discussion wise) before the feature land in master (I'm in no hurry, as I can just point to the feature branch and I don't mind API breakage for the specific application that I'm working on)?

breese commented 6 years ago

Yes. I still need to write documentation, but that does not block a merge to the develop/master branches.

breese commented 6 years ago

Oh, I forgot. I also want to implement parse/format for bintoken first, to make sure that the APIs works for binary protocols as well.

breese commented 6 years ago

Bintoken added in d69064935940e8a868fc35a408039597e72360b1

breese commented 6 years ago

Merged to develop in 28b99d242357f9559c7eef9f5518fcf20b3d3f3c

vinipsmaker2 commented 4 years ago

Should we add a json::partial::format(json::writer) function as well?

[...]

I'd have greater interest in this abstraction if I was developing some transformation tool.

I'm planning to write a http://jsonlines.org/ processing tool on top of gawk's plug-in mechanism that should be able to reformat just pieces of the JSON document. I'll take another look on the API by then.

breese commented 4 years ago

Looks useful for parsing a stream of incoming JSON chunks.