AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
227 stars 73 forks source link

How to ignore malformed-records while using from_avro in abris #318

Closed willianmrs closed 1 year ago

willianmrs commented 1 year ago

Hello, we use abris to deserialize Avro from a confluent schema-registry. But sometimes we get some records with errors that we want to ignore, how can we do that?

I was looking into withExceptionHandler, but it needs a default record with a matching schema, which is something we do not exactly have... It is possible just to ignore those records?

cerveada commented 1 year ago

I am not sure if it's possible, since Abris implements Spark's UnaryExpression that always expects output.

DeserializationExceptionHandler actually gives you the reader schema. Could you create the default record dynamically based on that? You can later filter out these default rows.

willianmrs commented 1 year ago

PermissiveRecordExceptionHandler implement. I'm closing the issue, thanks everyone for the help.