Closed vbalani002 closed 11 months ago
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.
:white_check_mark: vbalani002
:x: snehashisp
You have signed the CLA already but the status is still pending? Let us recheck it.
Problem
A change was introduced into the avro writer where it resets its internal buffer of records on receiving an IOException. This differed from the earlier model where a IOException would not clear out the existing buffer. Due to this there is a potential for data loss in the connector as it does not rewrite the topic data into the buffer even on getting an IOException. Once the avro writer recovers the connector will flush the buffer and commit the offsets ignoring the previously missed data.
Solution
This can only happen on the avro writer and the connector needs to rewind back the offsets of the topic partition and recover the buffer if such a situation arises. This PR addresses the changes required for this contingency by capturing the AVROIOException, rewinding the consumer offsets, and resetting any existing buffers to avoid pushing a duplicate data.
Does this solution apply anywhere else?
If yes, where?
Test Strategy
Testing done:
Release Plan