Open jurgispods opened 6 years ago
@pederpansen Can you share a minimal, complete schema along with a sample Producer code?
I feel this problem can be resolved by Avro namespaces on your records.
The error can happen because you have two type: array
fields with the same name, I believe.
And the error is from Avro itself, not Connect
Hello @pederpansen, if this issue is actual for you please check the following https://docs.confluent.io/4.1.0/ksql/docs/installation/server-config/avro-schema.html There is the following useful information at the beginning of the article:
Avro schemas with nested fields are not supported yet. This is because KSQL does not yet support nested columns. This functionality is coming soon.
@nickstatka777
You might want to look at the latest docs
Avro schemas with nested fields are supported. In KSQL 5.0 and higher, you can read nested data, in Avro and JSON formats
However, the question is not about KSQL
Hello @cricket007, yep, but I've discovered this issue with 4.1 version. But thank you for info)
@nickstatka777 I see. You may want to consider upgrading then. The issue here isn't with accessing the nested data with KSQL, though. Are you sure the error you were getting was the exact same?
The HDFS connector (version 4.1) fails after the first batch of events when writing Avro messages to Parquet files in HDFS in case there are nested arrays in the Avro schema.
Relevant part of the offending schema:
Connector exception:
Peeking into one of the parquet files written in the first batch (relevant part of
parquet-tools meta <file>
), reveals:Apparently, the Avro schema was converted to a Parquet schema in such a way that two fields with the name
array
were created. This schema is subsequently offendingorg.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility
.Possibly related: https://github.com/confluentinc/examples/issues/63
Can we do something about this - except preventing to use nested arrays, a fact we have limited influence on?