confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
12 stars 396 forks source link

RCCA-16236: AVRO buffer loss fix on IOException #678

Closed vbalani002 closed 11 months ago

vbalani002 commented 11 months ago

Problem

A change was introduced into the avro writer where it resets its internal buffer of records on receiving an IOException. This differed from the earlier model where a IOException would not clear out the existing buffer. Due to this there is a potential for data loss in the connector as it does not rewrite the topic data into the buffer even on getting an IOException. Once the avro writer recovers the connector will flush the buffer and commit the offsets ignoring the previously missed data.

Solution

This can only happen on the avro writer and the connector needs to rewind back the offsets of the topic partition and recover the buffer if such a situation arises. This PR addresses the changes required for this contingency by capturing the AVROIOException, rewinding the consumer offsets, and resetting any existing buffers to avoid pushing a duplicate data.

Does this solution apply anywhere else?
If yes, where?

Test Strategy

Testing done:

Release Plan

cla-assistant[bot] commented 11 months ago

CLA assistant check
All committers have signed the CLA.

cla-assistant[bot] commented 11 months ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: vbalani002
:x: snehashisp
You have signed the CLA already but the status is still pending? Let us recheck it.