Closed detoxfarm3 closed 3 years ago
1) Yes, the only way to do this is to separate the data in two dataframes each with one schema and then call abris separately on each of them. If the key is same for both it would be possible to do it by key, I think. But it depends on your use case.
2) Abris currently uses confluent 5.3. To be compatible with spark libraries. Schema reference are available in confluent 5.5 and higher. (for more details look at #175)
@cerveada Thanks for the info. We were able to create multiple streams by filtering the message. But we used Kafka headers to add metadata and used the same for filtering.
Hi
I tried to use multiple schema for different event in single topic.
1 So far, I have explored TopicRecordNameStrategy, where I can use a multiple schemas in a topic, but, this doesn't work with PySpark; it throws exception when trying to de-serialize multiple types of messages present in a single topic. This is due to the fact that a stream is bounded to a single schema only; I have seem similar question in the Issues as well where the conclusion is that we cant use it for this purpose!
Adding the error log for reference-
2 Tried Schema reference with TopicNameStrategy; where top level schema is a union of reference schemas ex.
schema: ["<namespace>.<schema name>"]
. But, this is failing with the following error; which to me looks as if reference schema is not yet supported. I have tried exploring some of the Abris code/example & didn't see any such mention of schema reference.For, from_avro below config code is used-
from_avro code-
Below is the error I got when trying to use schema reference