AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
227 stars 73 forks source link

foreach batch download by schem id #347

Closed talperetz1 closed 7 months ago

talperetz1 commented 11 months ago

Is it possible in spark stream foreach batch when reading from kafka to extract the schema id, download with abris by schema id and then performed from_avro on my value column to generate the dataframe?

cerveada commented 11 months ago

If you talk about multiple compatible schemas in one topic (that is schema evolution basically) then that is supported by Abris out of the box.

If you talk about multiple incompatible schemas in one topic, that is not supported. You could work around this by manually sorting rows by confluent id to separated data frames and after that run Abris on each of them separately.

It is the same idea described here: https://github.com/AbsaOSS/ABRiS#multiple-schemas-in-one-topic

luisvicenteatprima commented 9 months ago

@cerveada

If you talk about multiple compatible schemas in one topic (that is schema evolution basically) then that is supported by Abris out of the box.

By this you mean using the latest version of the schema or something similar, don't you?

cerveada commented 9 months ago

https://docs.confluent.io/cloud/current/sr/fundamentals/schema-evolution.html