google-code-export / protostuff

Automatically exported from code.google.com/p/protostuff
Apache License 2.0
1 stars 1 forks source link

Ignore null values in json parser #108

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Create a message with 1 field
 optional string test=1;
2. Deserialize to java_v2protoc_schema from json with 
{"test":null} 

What is the expected output? What do you see instead?
test should be null. Instead, NullPointerException from JsonInput.readString()

What version of the product are you using? On what operating system?

Please provide any additional information below.
Temporary hack is to change JsonInput to return an empty string on null token 
aka:
   public String readString() throws IOException
    {
        if (parser.getCurrentToken() == VALUE_NULL) {
            return "";
        }
 ...

Returning null does not work as the protoc generated code throws exception when 
setting a null.

Original issue reported on code.google.com by celan...@gmail.com on 2 Apr 2012 at 8:51

GoogleCodeExporter commented 9 years ago
Additional info:
java_bean serialization has the same problem, however it will support returning 
null instead of empty string.

Both serializations also break on null nested messages.
message M1 {
message M2 {
 optional int32 f1=1;
}
optional M2 m2=1;
optional int32 id=2;
}

json:
{"id":2, "m2":null }

Original comment by celan...@gmail.com on 3 Apr 2012 at 12:23

GoogleCodeExporter commented 9 years ago
Fix for both issues is to add this to the very end of JsonInput.readFieldNumber
        // if it is null, read the next field
        if (parser.getCurrentToken() == VALUE_NULL) {
            return readFieldNumber(schema, parser);
        }

aka
private <T> int readFieldNumber(final Schema<T> schema, final JsonParser 
parser) 
    throws IOException
    {
        final JsonToken jt = parser.nextToken();
        if(jt == END_OBJECT)
            return 0;

        if(jt != FIELD_NAME)
        {
            throw new JsonInputException("Expected token: $field: but was " + 
                    jt + " on message " + schema.messageFullName());
        }

        final String name = parser.getCurrentName();
        final int number = numeric ? Integer.parseInt(name) : schema.getFieldNumber(name);

        // move to the next token
        if(parser.nextToken() == START_ARRAY)
        {
            // if empty array, read the next field
            if(parser.nextToken() == END_ARRAY)
                return readFieldNumber(schema, parser);

            if(number == 0)
            {
                // unknown field
                if(parser.getCurrentToken().isScalarValue())
                {
                    // skip the scalar elements
                    while(parser.nextToken() != END_ARRAY);

                    return readFieldNumber(schema, parser);
                }

                throw new JsonInputException("Unknown field: " + name + " on message " + 
                        schema.messageFullName());
            }

            lastRepeated = true;
            lastName = name;
            lastNumber = number;

            return number;
        }

        if(number == 0)
        {
            // we can skip this unknown field
            if(parser.getCurrentToken().isScalarValue())
                return readFieldNumber(schema, parser);

            throw new JsonInputException("Unknown field: " + name + " on message " + 
                    schema.messageFullName());
        }

        lastName = name;
        lastNumber = number;
        // if it is null, read the next field
        if (parser.getCurrentToken() == VALUE_NULL) {
            return readFieldNumber(schema, parser);
        }

        return number;
    }

Original comment by celan...@gmail.com on 3 Apr 2012 at 12:52

GoogleCodeExporter commented 9 years ago
both java_v2_protoc_schema and java_bean will get JsonInputException when the 
string is null.

Here's the check in JsonInput.readString().
        if(parser.getCurrentToken() != VALUE_STRING)
            throw new JsonInputException("Expected token: string but was " + parser.getCurrentToken());

Still I do think that ignoring null values from JsonInput.readFieldNumber is a 
good idea.

It shall be added.  Thanks

Original comment by david.yu...@gmail.com on 3 Apr 2012 at 3:26

GoogleCodeExporter commented 9 years ago
Attached is the patch.
I'm now quite unsure if ignoring nulls is the best approach.
Fail-fast (handle JsonInputException when encountering null fields) might be 
more appropriate for others (includes me).

Original comment by david.yu...@gmail.com on 3 Apr 2012 at 4:42

Attachments:

GoogleCodeExporter commented 9 years ago
In my opinion, nulls for optional fields should not throw exceptions. Standard 
jackson ObjectMapper will put nulls in the json. Any sort of integration with 
3rd party json generation is going to have to support nulls for optional fields.

Nulls for required fields should throw exceptions though. 

One other issue with the quick fix I suggested is possible stack overflow 
exception from the recursive calls if a large objects with a lot of null fields 
in a row is passed in. Iterative would be much better.

Perhaps change readFieldNumber to return an Integer, where null is returned in 
case of null value or empty array, etc. Then the calling code can throw 
exception or call readFieldNumber again based on wether the field is required 
or optional. 

Changing JsonIOUtil to a non-static class and adding configuration options in 
the constructor or using interfaces or subclasses could help support throwing 
exceptions and or not. Moving the boolean for numeric or non-numeric field 
names into the same configuration would be a good idea as well. Check the way 
jackson handles configuration options for ObjectMapper for inspiration.

Original comment by celan...@gmail.com on 3 Apr 2012 at 5:16

GoogleCodeExporter commented 9 years ago
"Standard jackson ObjectMapper will put nulls in the json"
protobuf doesn't.  

"Nulls for required fields should throw exceptions though. "
The input/output do not know anything about required/optional fields.

"Changing JsonIOUtil to a non-static class and adding configuration options in 
the constructor or using interfaces or subclasses could help support throwing 
exceptions and or not"
You can simply create a CustomJsonInput that can fit your application's needs.

Remember that the datamodel is protobuf.  

If your datamodel is different, you can always generate your own pojos and 
allow passing nulls in your own way.

Original comment by david.yu...@gmail.com on 3 Apr 2012 at 5:59

GoogleCodeExporter commented 9 years ago
We are attempting the reverse of what is described on 
http://code.google.com/p/protostuff/wiki/PipeUsage simply reading json and 
outputing protobuff

Our use case is aggregatating from multiple 3rd party webservices that will 
provide us json including nulls, string wrapped integers (see other issue) and 
potentially other interesting problems we may find in the future. We take this 
json, convert it into protobuffs and store it in voldemort. We then have a 
seperate webservice read this protobuff object from voldemort, convert it to 
json and return the data. This project was picked as it seemed to directly 
support conversion both ways using the PipeUsage and it mostly works. The code 
is clean, we have no intermediate objects. This project is almost perfect for 
this task. The issues directly relate to not deserializing 100% validating json 
that were generated by the exact same json parser that you are using. This is 
no less a bug in my mind than not supporting in your java_bean representation 
100% of what is supported in protoc.

Take one of your own java_bean objects with an optional value that is set to 
null.
Serialize it using ObjectMapper. Deserialize it using JsonIOUtil. An exception 
will be thrown.

While there is a workaround using ObjectMapper to deserialize, this does not 
work with the java_v2protoc_schema + protoc builder patterns.

This is of course your project not mine, but I would suggest that while you are 
protobuff centric, your deserialization routines should support as close to 
100% valid protobuffs, json, xml or yaml as possible.

Original comment by celan...@gmail.com on 3 Apr 2012 at 6:48

GoogleCodeExporter commented 9 years ago
Fair enough.  Attached is v2.patch.  I've refactored it to not use recursion 
(using loops instead)
All tests pass.  Try it out and provide some feedback if you can.

One thing to note is that Pipe does not validate, it simply translates input to 
output.  In 99% cases, you need to validate data from 3rd party webservices 
(hence you need to do it the regular way by building the objects and do 
validation from there).

Pipe's javadoc:

 * It is recommended to use pipe only to stream data coming from server-side 
 * services (e.g from your datastore/etc).
 * 
 * Incoming data from the interwebs should not be piped due to 
 * validation/security purposes.

Original comment by david.yu...@gmail.com on 4 Apr 2012 at 5:07

Attachments:

GoogleCodeExporter commented 9 years ago
By 3rd party I mean 3rd party library (jackson 1.5) sitting in another team's 
code base that they aren't going to change :). I will try it out tomorrow at 
work thank you so much. 

I may have some example code for you shortly that shows how to integrate with 
voldemort and jersey/jaxrs for a simple webservice that accepts and responds 
with json but stores with protocol buffers

Original comment by celan...@gmail.com on 4 Apr 2012 at 5:25

GoogleCodeExporter commented 9 years ago
Attaching v3.patch (final)

Original comment by david.yu...@gmail.com on 4 Apr 2012 at 7:52

Attachments:

GoogleCodeExporter commented 9 years ago
fixed @ rev 1486

Original comment by david.yu...@gmail.com on 7 Apr 2012 at 4:35