AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
230 stars 75 forks source link

Question: How do I use the latest schema for each message #206

Closed moyphilip closed 3 years ago

moyphilip commented 3 years ago

Lets say I have only 1 schema in my registry for key and value with id 1

Following this example it will grab the latest schema.

https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/ConfluentKafkaAvroReader.scala

As my data is being streamed the producer decides to add a new field to the message resulting in a new schema for value (id 2). But I notice my current stream is not using the latest schema anymore but the old one (id 1) because I do not see the new field in my stream.

How do I pull the latest schema for every message?

cerveada commented 3 years ago

Let's say you are able to download the Avro schema (id 2) "on the run" and tell Abris to use it.

Then you have another problem. You are still using the same dataframe, that have the same Spark schema. That schema is missing that field and therefore the conversion will fail.

From that it should be clear that it doesn't make sense to do this. And that is why Abris gets the schema just once and then uses it. (At least on the producer side)

moyphilip commented 3 years ago

Thanks, I was able to pull in the latest schema for each streaming batch