breese / trial.protocol

Network wire protocols
11 stars 4 forks source link

Literal whitespace vs literal node #49

Open vinipsmaker opened 3 years ago

vinipsmaker commented 3 years ago

So, I was playing with lua bindings to trial.protocol (actually it's a bigger project that happens to have JSON support as well and I'm using trial.protocol for that) and I end up finally having an use-case for the writer API. So, the following lua code:

local json = require('json')

local writer = json.writer.new()

writer:begin_object()
writer:value('foo')
writer:value('bar')
writer:end_object()
print(writer:generate())

will print:

{"foo":"bar"}

So far so good, but the literal API seems incomplete. There are two types of literals — nodes and insignificant linear whitespace. Linear whitespace literals are useful to indent the generated document, but node literals are useful in serialization libraries. For my lua bindings, the user might write a __tojson() metamethod as:

function __tojson(self, state)
    local writer = state.writer

    writer:begin_object()
    writer:value('type')
    writer:value(0)
    writer:end_object()
end

But that's kind of verbose. And given the type to be serialized has a constant representation, that's also inefficient. I would like to be able to write the following:

function __tojson(self, state)
    local writer = state.writer

    writer:literal('{"type":0}')
end

But that obviously isn't going to work. When called as part of a bigger object as in:

writer:begin_object()
writer:value('hello')
writer:value('world')
writer:value('foo')
encode_foo(foo, writer)
writer:end_object()
print(writer:generate())

The output will be:

{"hello":"world","foo"{"type":0}}

An invalid JSON.

So, we need a type of literal that accounts for a raw node to be written. I don't think that's hard. I think the bikeshedding to choose the function name is going to be more demanding than the implementation itself.

Anyway, I'm not in a rush to see this issue solved. There are other features besides JSON I have to work on before releasing my project.

breese commented 3 years ago

What kind of API do you have in mind?

vinipsmaker commented 3 years ago

I think a...

size_t raw_node(view_type v);

It's the same signature as of literal(). The difference is that the usual separators will be inserted as if a node was inserted.

An alternative approach would be to change literal() parameters so you can state your intention and implementation choice there, but I don't like this idea.

TBH I think the hard task will be to choose the name. Is raw_node() a good name? The implementation itself doesn't seem challenging.

breese commented 3 years ago

The requirement is that the inserted fragment is a valid JSON element. Otherwise the writer could end up being confused about the current separator and nesting level.

So we could call it element().

vinipsmaker commented 3 years ago

The name element() works for me.

vinipsmaker commented 3 years ago

On a second thought, I think the code will be more readable if you use the name raw_element(). To the non-initiated user, there is no difference between value and element (and indeed there is little difference between these two grammar rules in the json spec). To someone that just started to hack on a new codebase, the name choices between value() and element() would seem arbitrary and not intuitive. raw_value() would be yet another option (but then again we're just one jump away from literal_value() and two jumps from literal()). Maybe rename literal() to literal_ws() and use the name literal() for this function (but it'd be a bold API breakage here).

breese commented 3 years ago

An alternative solution is a separator() function that writes the correct separator depending on context -- comma separator in a JSON array, alternating colon and comma separators in JSON object, and nothing in the top-level scope.

separator() must be called before inserting raw data. This would allow us to insert raw data in chunks using multiple literal() calls.

  writer.value<begin_array>();
  writer.separator();
  writer.literal("null");
  writer.separator();
  writer.literal("nu");
  writer.literal("ll");
  writer.value<end_array>();
vinipsmaker commented 3 years ago

That works.