bivas / protobuf-java-format

Provide serialization and de-serialization of different formats based on Google’s protobuf Message. Enables overriding the default (byte array) output to text based formats such as XML, JSON and HTML.
BSD 3-Clause "New" or "Revised" License
153 stars 97 forks source link

Unknown fields not being repopulated upon de-serialization from json #23

Open primshnick opened 8 years ago

primshnick commented 8 years ago

If I have a protobuf message with unknown fields:

leaf1: "Hello"
leaf2: 23
leaf3: 41
4: "world"

where leaf1, leaf2, and leaf3 are my known fields, and 4 is an unknown field. I then convert to json like so:

return new JsonFormat().printToString(protobuf);

I get the following:

{"leaf1": "Hello","leaf2": 23,"leaf3": [41], "4": ["world"]}`

However, if I then deserialize the json back like so:

public static <T extends Message> T convertToProtobuf(String json, T defaultInstance)
        throws ParseException {
        Builder builder = defaultInstance.newBuilderForType();
        JsonFormat format = new JsonFormat();
        format.merge(json, ExtensionRegistry.getEmptyRegistry(), builder);
        return (T)builder.build();
    }

then my result is the following:

leaf1: "Hello"
leaf2: 23
leaf3: 41

The problem is field 4 is not present in the reconstituted proto as an unknown field. Am I doing something wrong, or is this not supported? Thanks!

scr commented 8 years ago

@PeteyPabPro - it seems that your observation is correct - neither JsonFormat nor JsonJacksonFormat seem to slurp the unknown fields into the Builder/Message as they could/should.

I wrote a simple unit tests (disabled in the pull request) that shows this off. In the future, that's something you can/should do too - write a simple test along with minimal protobuf so that a fix can be written against the test as a specification.

I don't have time to fix this - would you be able to give your :+1: to PR #24, if not also enable & write a fix for it?

primshnick commented 8 years ago

I'll take a look..

scr commented 8 years ago

Hmm… I was just looking at JsonFormat vs. JsonJacksonFormat and found that they are different :frowning: This might not be a problem if you always used the same one on both ends, but I don't think that's fair to assume, and furthermore, JsonJackson has other issues, specifically with Groups, where it prints a JSON object, but expects a binary delimited ByteString on input/merge.

I'm not sure there's a spec for JSON, but the two JSON-related formatters should agree with one another.

For your complex example, here is the output from JsonFormat:

{
    "10": [
        "\b\u0006\u0012\u0005\rff\uffa6\uffbf",
        "\b\uff92\uffff\uffff\uffff\uffff\uffff\uffff\uffff\uffff\u0001\u0012\u0005\r\u0000\u0000\u0000\u0000"
    ],
    "2": [
        "world"
    ],
    "3": [
        "I",
        "am"
    ],
    "4": [
        "\b\u001e\u0012\u0005\r\uff9a\uff99\uff99?"
    ],
    "5": [
        51232271120233
    ],
    "6": [
        6
    ],
    "7": [
        1075000115
    ],
    "8": [
        4614253070214989087
    ],
    "9": [
        {
            "1": [
                "hi"
            ],
            "2": [
                1084647014
            ],
            "3": [
                23
            ],
            "4": [
                44232993922327
            ]
        }
    ],
    "knownfield": "hello"
}

And here is the output from JsonJacksonFormat:

{
    "10": [
        "CAYSBQ1mZqa/",
        "CJL//////////wESBQ0AAAAA"
    ],
    "2": [
        "d29ybGQ="
    ],
    "3": [
        "SQ==",
        "YW0="
    ],
    "4": [
        "CB4SBQ2amZk/"
    ],
    "5": [
        51232271120233
    ],
    "6": [
        6
    ],
    "7": [
        1075000115
    ],
    "8": [
        4614253070214989087
    ],
    "9": [
        {
            "1": [
                "aGk="
            ],
            "2": [
                1084647014
            ],
            "3": [
                23
            ],
            "4": [
                44232993922327
            ]
        }
    ],
    "knownfield": "hello"
}
primshnick commented 8 years ago

Hmm.. So do you think

  1. My fix to JSONFormat
  2. Fixing JSONJacksonFormat to be consistent with itself and JSONFormat
  3. Fixing JSONJacksonFormat unknown field parsing

should all be part of my PR? Or is it ok to just stick with (1) for this PR, and then do the Jackson stuff in another one?

scr commented 8 years ago

Good questions - I acknowledge that my observation didn't give opinion on next steps, so let me think a moment.

I think we need to understand & document if possible what the format should be for JSON - It's unfortunate - I don't see wiki enabled for this, but we can add the gh-pages branch for documenting things…