FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

[Avro] jackson-dataformat-avro package is not working with avro's default name mangling for reserved words #99

Open bowenli86 opened 7 years ago

bowenli86 commented 7 years ago

Hi there!

We saw a bug in jackson that it fails to differentiate reserved keywords in string from java class keywords from when deserializing json blob.

Here's our Avro schema:

{
    "type": "record",
    "name": "my.event",
    "fields": [
        {
            "name": "item_condition",
            "type": {
                "type": "enum",
                "name": "item_conditions",
                "symbols": [
                    "other",
                    "used",
                    "open_box",
                    "reconditioned_certified",
                    "new"
                ]
            }
        }
    ]
}

When the enum value is "new" in serialization, avro is putting a '$' sign after reserve words like "new$". When deserializing the json blob, jackson parser throws error as follows:


Can not deserialize value of type my.event.item_conditions from String "new": value not one of declared Enum instance names: other, new$, open_box, used, reconditioned_certified at Source: N/A; line: -1, column: -1 (through reference chain: my.event["item_condition"])

In such case, shall jackson-dataformats-avro be able to detect the '$' sign when deserializing the json blob?

Thanks, Bowen

cowtowncoder commented 7 years ago

First of all, thank you for reporting this problem!

Would it be possible to write a simple self-contained (unit) test to reproduce this? If Avro uses such name-mangling, there perhaps should be something to deal with it. But it would be necessary to first know where it happens. I can see why such work-around may be necessary for data-binding.

bowenli86 commented 7 years ago

Sure. Shall I create a PR containing that unit test to this repo?

cowtowncoder commented 7 years ago

@bowenli86 that's a good way, yes.

bowenli86 commented 7 years ago

Hi @cowtowncoder

The PR is https://github.com/FasterXML/jackson-dataformats-binary/pull/100

I realized this actually has things to do with jackson-databind, not really jackson-dataformats-binary. Let me know if you prefer me to submit PR to jackson-databind.

Thanks!

cowtowncoder commented 7 years ago

Well, handling for this can not really be added in jackson-databind (since it does not have anything format-specific, by design), so I think issue and PR here are fine. It is possible that sometimes additional support for more configuration needs to be added there, for those a new issue may be filed. But we'll see.

But I think I'll need to see if I can find something explaining handling in Avro spec. A simple solution would seem to be to post-process schema-provided ids to expose dollar-enhanced value (instead of literal from schema).

bowenli86 commented 7 years ago

yes. I think we probably need to add logic to handle '$'

cowtowncoder commented 7 years ago

Hmmh. I have not been able to find explanation of this in Avro specification -- is this specified somewhere? Wouldn't it actually just make sense to require user to do renaming in POJO, a al

enum Values {
   old,
   @JsonProperty("new")
    new$
;

I mean it's user that is deciding to use keyword that is legal in Avro but reserved in Java?

bowenli86 commented 7 years ago

We are using Avro's code-gen to generate java classes from avro schemas. So we can't really modify the POJO in the way you suggested.

I didn't find any documentation too. The only place showing such scenario is in https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/compiler/specific/SpecificCompiler.html The class description says "Generate specific Java interfaces and classes for protocols and schemas. Java reserved keywords are mangled to preserve compilation.".

cowtowncoder commented 7 years ago

@bowenli86 Does Avro add some sort of annotations to indicate this discrepancy? If so it should be easy to make avro annotation introspector (included already with 2.8 I think) take that information. Like, @AvroName or something?

If not, use of AnnotationIntrospector is probably still the way to go. Although... whether this way, or by decoder, problem is that it's not $ to get rid of but rather to add. So would need a blacklist of reserved names. May not be the biggest issue ever as long as it only applies to Avro usage.

cowtowncoder commented 7 years ago

@bowenli86 Does Avro add some sort of annotations to indicate this discrepancy? If so it should be easy to make avro annotation introspector (included already with 2.8 I think) take that information. Like, @AvroName or something? Javadocs did not indicate whether this might be usable with enums, or only for property name matching.

If not, use of AnnotationIntrospector is probably still the way to go. Although... whether this way, or by decoder, problem is that it's not $ to get rid of but rather to add. So would need a blacklist of reserved names. May not be the biggest issue ever as long as it only applies to Avro usage.