FundingCircle / jackdaw

A Clojure library for the Apache Kafka distributed streaming platform.
https://fundingcircle.github.io/jackdaw/
BSD 3-Clause "New" or "Revised" License
369 stars 80 forks source link

Removing nil values after deserializing avro #251

Open mt3593 opened 4 years ago

mt3593 commented 4 years ago

An optional field in AVRO is currently set by setting a default to nil and setting up a union type with "null" as the first type in that union.

When read back it fills out the values with nil.

This means that I have to now deal with the nils in my code rather than the absence of the key in the map. Not just that but it also effects my tests, when I pull back a value from a avro topic I now pull back these nil that I did not set, this means that all schema changes break my tests as I now have to update to assert they have the nil values.

Personally I think we should strip out nil values, given I didn't set the nil and it's an optional field we can't determine which is set by default and which is set by the user so I think it's safe to remove and makes coding against it easier.

cddr commented 4 years ago

I think changing this behaviour would risk breaking existing programs but it shouldn't be difficult to get the behaviour you'd like using something which strips out the nils after deserialization.

But ignoring that, I think that the existing "in-memory" representation of avro messages already deviates more than it should from the "standard json" representation which includes a tag to identify which element of the union is satisfied by the value.

Personally if I was starting a project from scratch, I'd avoid the avro implementation in here which has a few mistakes that make it less interoperable with other avro based systems than it could/should be. This PR has more specifics on the problem I'm talking about https://github.com/FundingCircle/jackdaw/pull/224

creese commented 4 years ago

Does it matter that the map contains keys your applications doesn't need? I believe clojure.spec follows a similar philosophy here.

mt3593 commented 4 years ago

@creese with Clojure spec if your map contains keys it would apply the spec against the key (if it has a spec for the key of course).

(s/def ::name string?)
(s/def ::abc (s/keys :opt [::name]))

(s/assert* ::abc {}) ;; Happy with this
(s/assert* ::abc {::name ""}) ;; Happy with this
(s/assert* ::abc {::name nil}) ;; Not happy with this

Spec prefers you to remove the nils.

Spec does allow you to provide more keys than are declared in the spec:

(s/assert* ::abc {::name "" :id "1234" :foo 98})

Which I think is the principle you are talking about but that doesn't apply here as we only get back what's in the schema.

@cddr currently I've done exactly that, after deserialization I strip out the nil's. Agree changing this behaviour would break existing programs (or just give someone some headache updating tests). Ideally want to keep things down to one library, makes updating easier and it's nice to only have a handful of options open source wise. Would you entertain the idea of having this as a config? Something you have to explicitly turn on? Or given your comment about interoperability could we create a version 2 and slowly deprecate the old version?

cddr commented 4 years ago

Spec prefers you to remove the nils.

Remember that spec also has the s/nilable constructor which might be a better match for avro's ["null", "some-type"]

Would you entertain the idea of having this as a config?

Thankfully not my call really. I'm just an enthusiastic contributor now :-)