FundingCircle / jackdaw

A Clojure library for the Apache Kafka distributed streaming platform.
https://fundingcircle.github.io/jackdaw/
BSD 3-Clause "New" or "Revised" License
369 stars 80 forks source link

Decouple avro reader and writer serdes #157

Closed cddr closed 5 years ago

cddr commented 5 years ago

I believe this resolves a few issues including #152, #137. It also represents part of the work required to resolve #136 although that might need a bit more work on the serialization side and could certainly use a test and documentation to demonstrate it's use. I plan to follow up on this with another guide in the docs section all about configuring the various kafka client entities (e.g. producers, consumers, serdes etc).

The main change is to add a :deserialization-properties optional keyword to the confluent serde constructor. Including the property "specific.avro.reader" in these properties means that any schema supplied to the deserializer will be used in combination with the writer's schema to decode the message. The reader schema must be compatible with the writer's schema in order for this to work correctly. See the schema resolution rules in the related links for the details about how this works.

Another change is that it is no longer an error if you try to create an avro serde without specifying a local copy of the schema. This can be used to consume data produced by some other process that does have access to the schema or which generates it's own schema on the fly (e.g. kafka-connect, debezium etc).

Finally I tried to remove some of the magic from the serde-resolver function. It should remain logically the same (except for not throwing an error as described above) but it is hopefully a bit clearer what's happening now.

Related Links:

codecov-io commented 5 years ago

Codecov Report

Merging #157 into master will increase coverage by 0.02%. The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #157      +/-   ##
==========================================
+ Coverage   81.04%   81.07%   +0.02%     
==========================================
  Files          39       39              
  Lines        2364     2362       -2     
  Branches      149      149              
==========================================
- Hits         1916     1915       -1     
+ Misses        299      298       -1     
  Partials      149      149
Impacted Files Coverage Δ
src/jackdaw/serdes/avro.clj 87.88% <100%> (+0.42%) :arrow_up:
src/jackdaw/serdes/resolver.clj 90.32% <76.92%> (-1.35%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7e4e5e7...8551c18. Read the comment docs.

99-not-out commented 5 years ago

I would perhaps go further here open it up more - if we allow the serailizer and deserializer fns here https://github.com/FundingCircle/jackdaw/blob/7e4e5e7be717334b75b04242deb5971f6d52b479/src/jackdaw/serdes/avro.clj#L642 to be optionally passed in then the user can experiment with stuff like this all they want.

This allows overriding the default behaviour completely, and we can provide several versions of these two fns if we find good candiates - or eventually alter the default behaviour.

But such a change would allow experimentation in this area without so much local overriding (which is hard as the two fns in question are currently private).

Approving the original PR as my thoughts above are needlessly tangential for the immediate concern here