FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
304 stars 128 forks source link

[Avro] feature wish: write enums as string #494

Open jlous opened 3 weeks ago

jlous commented 3 weeks ago

With json, I could always extend an enum (on both sender and receiver ends), consider it a backwards compatible change, and the receiver could parse both old and new documents equally well.

With avro, altering an enum makes the entire schema incompatible, and an updated schema can not be used to read old docs. For all my use cases, serialising all enums as string would be a completely acceptable strategy for avoiding this problem, and much preferable to versioned formats, but jacksons avro support does not currently seem to offer this.

So I'm hoping for a new feature switch for avro: WRITE_ENUMS_AS_STRINGS or similar.

In my case I only really care about serialising, since the receiver is on a completely different platform, but I guess parity on the deserialising side would be natural to include.

cowtowncoder commented 3 weeks ago

Would this be possible wrt Avro schema limitations? Would new type be defined as Union, allowing both String and Enum? And which direction is the change? (older schema exception String, new Enum? Or vice versa)

At JsonGenerator (and so AvroGenerator subtype) level Enums are typically written using writeString() anyway (since JSON has no "Enum" type; conversion handled at databind level). Same for JsonParser/AvroParser.

So I am not 100% sure I yet understand the ask here.

jlous commented 2 weeks ago

I am suggesting an option where the generated avro schema for an enum field would simply be String.

This would enable extending the enum in the future, with no change in schema.

cowtowncoder commented 2 weeks ago

@jlous Ah ok. Depending on how implemented it might even be a general MapperFeature; I forget what the division is between model traversal (callbacks generated on serialization settings) and construction of Avro schema.

At this point I probably won't have time to work on this in near term but would be happy to help if anyone else wants to tackle it.

MichalFoksa commented 1 week ago

@jlous BTW: If you change in Avro schema enum to string type, serialization and deserialization should work already.

MichalFoksa commented 1 week ago

I try to avoid enum in Avro schema. Serializing enum into string is a good idea.

@cowtowncoder How do you want to control this feature?

I think creating a new AvroGenerator.Feature would be best from API standpoint, but I do not know how to access AvroGenerator.Feature from VisitorFormatWrapperImpl.expectStringFormat(JavaType).

BTW: Here is a draft PR, https://github.com/FasterXML/jackson-dataformats-binary/pull/496, where this feature is controlled by enableWriteEnumAsString() method:

AvroSchemaGenerator gen = new AvroSchemaGenerator()
    .enableWriteEnumAsString();
cowtowncoder commented 1 week ago

Ah. Sorry, saw this comment before adding a note on PR.

Yes, AvroGenerator.Feature would be better but I'd need to think of plumbing.

I think AvroGenerator might be available from SerializerProvider but not sure if that is properly initialized.

MichalFoksa commented 1 week ago

Let's continue discussion in PR https://github.com/FasterXML/jackson-dataformats-binary/pull/496.