AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
227 stars 73 forks source link

How to capture bad records while using from_avro in abris #312

Open Tarannump opened 1 year ago

Tarannump commented 1 year ago

I have use case to capture the bad records and store it in a separate location for future reference. Is it possible to get the records which are not deserialised instead of dropping them?

cerveada commented 1 year ago

You can use exception handlers introduced in this PR: https://github.com/AbsaOSS/ABRiS/pull/290

You will have to implement your own handler that will store the records though.

Tarannump commented 1 year ago

Thanks @cerveada for your response. We are using Glue Streaming ETL job which contains spark 3.1.1. Based on the PR looks like exception handlers are added in abris 6.3.0 which has spark 3.2.1. Is it possible to get the same feature in abris 5 as well?

kevinwallimann commented 1 year ago

Hi @Tarannump Unfortunately we don't have the capacity to backport the feature to ABRiS 5. Of course, you are welcome to fork the repo or submit a pull request.