c-rack / cbor-java

Java implementation of RFC 7049: Concise Binary Object Representation (CBOR)
http://cbor.io/
Apache License 2.0
118 stars 25 forks source link

Builder & Tags: result doesn't match parsed results #57

Closed andreasWallner closed 4 years ago

andreasWallner commented 7 years ago

While using the library I stumbled upon how the builder currently handles tags. There seems to be quite a difference between what the Builder produces and what the Decoder will produce:

new CborBuilder().addTag(1).add(22)

which results in a list of two items, the first being the tag, the second being the integer. If we encode the result we will get (as expected):

C116

If we decode the same, we'll get a single DataItem of type UnsignedInteger with the tag set appropriately. This difference is a bit bad when writing functions that should operate on a tree of DataItems.

Additionally we can only tag DataItems on the "top level". The ".addTag()" interface could work inside array, but doesn't at all in maps. I think a nice interface could be to provide static methods in CborBuilder like

public static DataItem tagged(int tag, ...)

that could be used like:

final List<DataItem> di = new CborBuilder()
    .add(tagged(2, tagged(1, "foobar")))
    .build();

where the result would encode as

C2                                               # tag(2)
  C1                                             # tag(1)
    66                                           # text(6)
      666F6F626172                               #   "foobar"

What would you think of that as an addition to the CborBuilder?

c-rack commented 7 years ago

I am a little bit undecided about this, because then, the tagged() call is not part of the fluent interface. On the other side, implementing a startTag() / endTag() might bloat the code, but it would nicely fit into the existing builder (the addTag method could be marked as deprecated then):

final List<DataItem> di = new CborBuilder()
    .startTag(2)
    .startTag(1)
    .add("foobar")
    .endTag()
    .endTag()
    .build();
andreasWallnerIFX commented 7 years ago

You're right, it breaks the fluent interface. On the otherhand I'm not sure if the interface is of much use, since there can only ever be one item inside a tag to keep everything well formed. To keep the interface easy to use correctly I'd much rather go for something that looks like:

x = new CborBuilder()
    .addTagged(2).addTagged(1).add("foobar")
    .build()

So that the TagBuilder does not have an .end() . I'm not sure I'm too happy with the names I chose in the example (it's not really an add...) but the idea should be clear. I haven't fully thought through the generics involved, but I think this should be implementable...

andreasWallner commented 6 years ago

I took a stab at the implementation using the fluent interface. I started with integrating it into the ArrayBuilder, you can have a look at it in my branch here: andreasWallner@02ed1ce077279242356530cd9c30de78edcbfd6d

I think for the ArrayBuilder this looks quite nice (and therefore by extension for the root builder). The thing I did not find a good solution for (from an API perspective) is maps: Apart from the static method in the beginning I did not find a nice way that would allow to tag key and/or value.

Can you think of a way to do this for a map?

divegeek commented 4 years ago

+1 for this issue.

My current workaround is to build, encode and decode in order to get the right object representation. I only have to do this for unit tests, so it's not too terrible, but it's pretty ugly.

Andreas' approach looks good to me, though I don't think "addTagged" makes sense, and I'd prefer to see the tags added after the value they tag, e.g.

x = new CborBuilder()
    .add("foobar").tagged(2).tagged(1)
    .build()

That fits my mental model of CBOR tagging better, even if it does specify tags in reverse order of the way they appear in the encoded bytestream. If I get few minutes I may send a PR for you to look at.

andreasWallner commented 4 years ago

@divegeek As hinted at in my last message, I did not find a nice solution that had a fluent-like interface, those did not work out for the map case. In our local fork I went ahead an implemented my initial approach (as shown in the initial issue text), and did not stick with the fluent interface. I'd be happy to share that - I'll push a branch once I get home.

divegeek commented 4 years ago

@andreasWallner I think my approach should work with maps, though I haven't tested. It certainly could, structurally. I'll check, and make the appropriate tweaks as needed.

divegeek commented 4 years ago

I added a map to the test case in my PR. Works fine; no change to the implementation required.

divegeek commented 4 years ago

Actually... it occurred to me this morning that there's another case that isn't handled well: tagging of map entries. This is trickier because the put(x, y) syntax doesn't provide any obvious way to include tags.

I'm thinking about implementing support for the following syntax:

new CBorBuilder().addMap()
    .addKey("a").value("b");

With that approach, it would be easy enough to support, e.g.:

new CBorBuilder().addMap()
    .addKey("a").tagged(300).tagged(301).value("b").tagged(302)

So that keys and values could be tagged, including with nested tags.

What do you think about that syntax?

divegeek commented 4 years ago

My PR now implements the above. So it supports nested tags, tags on maps and map entries, and tags on arrays and array entries, and in all cases (unless I missed something) the fluent interface using tagged() produces the same object structure as the parser.

andreasWallner commented 4 years ago

@divegeek The drawback of the interface that I see is that you can't do a streaming write, which was one of the reasons for my original statement that I don't see a nice way of implementing tags in the interface. (streaming write as was raised in #24 ) How do you put a tagged map into a map using your current interface? (I guess you could do that with an additional function in the MapEntryBuilder?)

Another drawback that I see (but one could possibly fix this with a different implementation, I did not yet have time to check): Up until now the validity of generated CBOR Streams was AFAIK guaranteed during compilation time because of type checking. With this change we move error checking to runtime.

c-rack commented 4 years ago

Haven't been online for a while and saw this PR unmerged. So I have merged it according to our C4 process.

c-rack commented 4 years ago

@divegeek what do you think about the drawbacks that @andreasWallner mentioned?