FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

[Avro] Add `logicalType` support for some `java.time` types; add `AvroJavaTimeModule` for native ser/deser #283

Closed MichalFoksa closed 3 years ago

MichalFoksa commented 3 years ago

Functionality to support Avro schema with logicaltype for few java.time types. It is achieved by new DateTimeVisitor, new module AvroJavaTimeModule and couple of serializers and deserializers.

AvroJavaTimeModule works in milliseconds precision.

AvroMapper.builder()
    .addModule(new AvroJavaTimeModule())
    .build();    

Supported java types:

Risk

I think risk comes for backwards compatibility.

In 2.11 these Java types were somehow supported, they have got serialized either into array. If JSR310 JavaTimeModule with some combination of WRITE_DATE_TIMESTAMPS_AS_NANOSECONDS it coudd have been serialize also into string or long.

Since 2.12 an exception is throw on serialization of any of these types on out of the box configuration, AvroMapper.builder().build(). (I did not investigate further what might be idea behind this new behavior.)

Exception after OffsetDateTime serialization in 2.12:

InvalidDefinitionException: "Any" type (usually for `java.lang.Object`) not supported: `expectAnyFormat` 
called with type [simple type, class java.time.OffsetDateTime]`

DateTimeVisitor cannot be turned off. Anyone already using JSR310 JavaTimeModule and disabling WRITE_DATE_TIMESTAMPS_AS_NANOSECONDS will suddenly receive logicalType instead of java-class in Avro schema.

java.time.OffsetDateTime schema before PR:

:
{
  "type" : "long",
  "java-class" : "java.time.OffsetDateTime"
}

java.time.OffsetDateTime schema after PR:

{
  "type" : "long",
  "logicalType" : "timestamp-millis"
}

It could be a problem for someone who already dealt with java.time types and found this workaround, but that somebody can adapt to in my opinion more Avro compatible solution.

Open points

Precision

Avro supports milliseconds and microseconds previsions for date and time related logicalType(s). I have implemented milliseconds precision only (based on my needs).

It is easy to change PR to microseconds prevision only. To support both previsions I think a new Feature switch have to be created - also not a problem.

java.util.Date

I have decided to remove java.util.Date from supported types due to potential backwards compatibility issues. At the moment schema for java.util.Date looks like this:

{
  "type" : "long",
  "java-class" : "java.util.Date"
}

After it would look like:

{
  "type" : "long",
  "logicalType" : "timestamp-millis"
}

I feel like it could cause issue to more people than issue with java.time types.

cowtowncoder commented 3 years ago

Looks good; I added some suggestions.

Couple of things I would need:

  1. Addition(s) to avro/README.md to explain usage would be good
  2. Unless I have gotten CLA (apologies if I missed that), I'd need one (see https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf) -- the usual way is to print, fill & sign, scan/photo, email to info at fasterxml dot com

and with that I would be able to merge this PR to add neat new functionality in for 2.13!

Also I have one question: do you think there is any likelihood that some usage might be broken due to addition of date/time type detection? If there was, adding a configurability option for AvroMapper would make sense.

MichalFoksa commented 3 years ago

@cowtowncoder Thank you for your time. I am sorry for later response, I had a short time off.

MichalFoksa commented 3 years ago

@cowtowncoder

cowtowncoder commented 3 years ago

First of all: absolutely no problem if things take time -- I take my time and this is all voluntary! Great to get contributions how much trouble it can be to get submission "just right" so I appreciate your work here.

Additional PRs: I think one possibility is to just close existing one(s), push new? Could also be that you can push to your original forked branch (I actually do not know for sure). Commit modification fine from my end, whatever github allows.

I do try to squash merges where applicable, but I am not the tidiest git committer myself.

I did receive CLA, will check that in.

cowtowncoder commented 3 years ago

Looks good; hoping to review again today or tomorrow, get merged.

MichalFoksa commented 3 years ago

My understanding is that these are rather "abstract" date/time/datetime values which do not have concrete time point without addition time zone or offset.

This is correct and now I understand how mention of "local" might be confusing. So in essence I will replace mentions of "local" with some "without timezone" meaning in documentation. BTW: According to Avro specification logicalType for datetime without timezone is local-timestamp-millis.

Next I will think of a switch to enable/disable logicalType functionality.

cowtowncoder commented 3 years ago

@MichalFoksa I think I am mostly happy here; added minor notes on things to mention in README but I think I agree wrt ser/deser aspects given how Avro spec defines things (which is bit confusing wrt "local"/"non-local" timestamp but whatever).

The one last thing I noticed was your mention of allowing disabling of generation of logical type, as an option. If you want to do that, I could first merge this PR, and let you work on that separately? There are couple of ways to do that, f.ex:

  1. Add AvroGenerator.Feature(s) (since schema generation is done on generation side, for Jackson 2.x at least)
  2. Add new AvroMapper.Feature (since it's bit of cross-cutting concern)
  3. Configure via AvroModule, instead of mapper -- ideally adding Builder (similar to how AvroMapper etc are configured)
cowtowncoder commented 3 years ago

Finally merged! Thank you @MichalFoksa and @dinomite -- will go in 2.13, although there can still be improvements. I also realized that BigDecimal part of logical types is missing but nice to get in.

MichalFoksa commented 3 years ago

@cowtowncoder Great :) Thank you.