Open xc-cre opened 1 year ago
There are two KafkaAvroDeserializerConfig
properties you need to set on the consumer:
KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG = true
KafkaAvroDeserializerConfig.SPECIFIC_AVRO_VALUE_TYPE_CONFIG = <FQCN of the avro generated POJO>
These are meant to instruct the deserializer to use the schema embedded in the Avro generated POJO which will include the "avro.java.string": "String"
annotations and deserialize string avro types as java.lang.String
We observe the same behavior when we update the io.confluent.kafka-avro-serializer dependency from version 7.3.1 to a higher one. (tested with 7.3.3, 7.4.2, 7.5.1 and 7.5.3).
Have you found a sensible solution for this? At the moment we are only left with the older dependency, which also works.
Does anyone know if this has already been solved? We have communicated in our company that upgrading to 7.5.1 is very dangerous and nearly all the teams that did it nontheless had huge incidents afterwards. My team now wants to try 7.6.+ We halted the merge request and will investigate that on qa branches a bit longer befor going the way to prod
@Muenze since we were also affected by the problem, I was able to reproduce it with an example.
Cause:
The problem occurs because since version 7.3.3 the AvroDeserializer respects the configuration for use.latest.version
and does not ignore it as in previous versions.
In previous versions, the schema was taken into account, which classes generated from Avro schemas contained and usually also had the <stringType>String</stringType>
defined here.
The schemas also included "avro.java.String":"String"
which ensured that we also got java.lang.String in collections instead of org.apache.avro.util.Utf8
.
By taking use.latest.version
into account, the schema registry is always queried and the schema existing there is used. If there is no "avro.java.String":"String"
in the schema, then the elements will be returned as org.apache.avro.util.Utf8
.
From this I was able to derive three solutions:
"avro.java.String":"String"
into all schemas in the registry (was not practical for us)<stringType>String</stringType>
to <stringType>CharSequence</stringType>
, since both the java.lang.String and org.apache.avro.util.Utf8 classes have the interface CharSequence implemented and this means that ClassCastException no longer occursuse.latest.version=true
if its not requiredI hope these suggested solutions are helpful to you.
When reading events using the following schema we one gets a
ClassCastException
as elements of someStrings are of typeUtf8
notString
. We're using SpecificRecords and the Avro Java Generator maven plugin with optionWhich adds
"avro.java.string": "String"
properties to all String type avro fields to the schema included in the generated class. In the Producer we set propertyavro.remove.java.properties = true
which strips these properties at runtime before calling the registry. So we can keep the schemas on the registry java-agnostic and don't have to include this language-specific behaviour there. The problem now comes with the consumer which takes the writer schemas as it's reader schema because schema type isUNION
and this schema does not include these properties. So the deserializer uses Avro default String classUtf8
. The specific recordRecordContainingArrayOfStrings
(which is returned by the deserializer) is defined as typeString
notUtf8
though so we get aClassCastException
when accessing these fields asList<String>
.(We're using suggested top-level unions as suggested in Putting Several Event Types in the Same Topic – Revisited to handle multiple event types in a single topic)
I currently see two possible solutions/workarounds for this: 1) Either we have to add these java-specific properties to the schema registered on the registry. That wouldn't be ideal as we'd like to keep them language agnostic (also would create hundreds of new schema versions) and would potentially break producers as they would not be able to find Schema anymore on the registry as long as they have avro.remove.java.properties set to true. 2) Or
KafkaAvroDeserializer
will add opposite operation of avro.remove.java.properties which adds these java properties to string types before using it as reader schema 3) (Also mentioned in Issue 2704) Deserializer somehow uses the actual SpecificRecord instance used in the Union (not the global class instanceSpecificData.get()
) as this one contains a schema with these properties as added by Avro Generator maven pluginAny ideas how to fix this?