Open simonwahlgren opened 4 years ago
it does seem like something is off here. if you use the latest schema when reading, and the schema is backwards compatible (i.e. you provide a default value), it shouldn't fail.
side note: forwards compatibility is often important with kafka as well because it's often the case that producers get updated before consumers.
@mhowlett Exactly. The problem seems to be that the avro serializer always fetches the schema id for the message that it was produced with, and there's no way to control this. See here: https://github.com/confluentinc/confluent-kafka-python/blob/master/confluent_kafka/avro/serializer/message_serializer.py#L163
I would expect something like this:
Description
I'll try to explain using an example instead:
latest
asauto.offset.reset
. The schema registry is usingbackward
as compatibility config.foo
(with default value set to null) and produce a new message with schema id 2foo
, everything so far fine and dandyauto.offset.reset
toearliest
and update thegroup.id
. Now the problem begins. After re-starting the consumer it will crash because the first produced message (version 1) doesn't have thefoo
field, and since we always use the writer schema to decode the message it will always fail when reading the old messages. I can of course use aget
and return a default value in the application if a field doesn't exist, but that means we also always have to make sure our applications are backward compatible which feels wrong.So to my question: Is this the expected behavior or am I missing something?
After reading https://docs.confluent.io/current/schema-registry/avro.html my understanding is that if we are using
backward
as compatibility config we should always read the latest version of the schema and not the version it was produced with.Here's a quote from the page from the "Backward Compatibility" section: