airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.05k stars 4.11k forks source link

[source-mysql] : don't emit final state if there is an underlying stream failure #34881

Closed akashkulk closed 8 months ago

akashkulk commented 9 months ago

Topic

No response

Relevant information

Context : https://airbytehq-team.slack.com/archives/C043JHEEYKG/p1707152897893279

We're seeing failures in source-mysql where :

  1. A stream fails
  2. State iterator emits final state message (indicating that the snapshot is complete)
  3. Airbyte proceeds to perform incremental syncs.

This causes data loss, as 2 should emit an intermediate state message. We've lost all the data since the first failure.

The fix is to change the logic :

  1. If a stream snapshot has failed, emit the intermediate message OR fail
  2. Throw an exception to prevent the sync from progressing further

This would be inline with how source-mongo & source-postgres deal with these failures. Furthermore, as SourceStateIterator will be used as the base class for emitting state counts for Postgres & Mongo, this behavior should be standardized.

akashkulk commented 9 months ago

This was a regression introduced in version 3.3.1