airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.27k stars 4.15k forks source link

Bulk load CDK: make tests comply with protocol by waiting for state ack #48610

Closed edgao closed 1 day ago

edgao commented 2 days ago

closes https://github.com/airbytehq/airbyte-internal-issues/issues/10413. This is heavily based on https://github.com/airbytehq/airbyte/pull/48583, with some refactors:

this PR also fixes a bug in the RecordDiffer (it wasn't correctly sorting records, so tests could fail nondeterminstically).

legacy CDK connectors always flushed all pending data, even on a stream INCOMPLETE. The new CDK has stricter protocol compliance, in that it doesn't make any guarantees about pending work, it only guarantees that records prior to an acked state message are persisted.

So tests from the old CDK don't directly work on bulk CDK connectors. This PR updates these tests to force a state ack (by pushing a ton of records to force a flush). This makes the tests take longer to run, which we should improve at some point in the future (https://github.com/airbytehq/airbyte-internal-issues/issues/10911).

vercel[bot] commented 2 days ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **airbyte-docs** | ⬜️ Ignored ([Inspect](https://vercel.com/airbyte-growth/airbyte-docs/B2heuxHPtxQJMWobVnp37TUuRLtS)) | [Visit Preview](https://airbyte-docs-git-edgao-morecompliantinter-953fce-airbyte-growth.vercel.app) | | Nov 22, 2024 7:54pm |