airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.4k stars 3.97k forks source link

Refresh/resync single table within a connection (per-stream syncs) #23067

Open JonoB opened 1 year ago

JonoB commented 1 year ago

Tell us about the problem you're trying to solve

Resync/refresh a single table within a stream

Describe the solution you’d like

Sometimes a table gets out of sync, or the source schema changes. It would be great to be able to resync/refresh just that single table, with affecting any other tables in the stream. The optimal solution would be a button next to each table in the "Replication" view that performs this action.

Describe the alternative you’ve considered or used

Currently, you have to refresh all tables within the stream, which is very impractical when you have a lot of tables and/or very large tables. I've also found a workaround for myself by logging into the database and manually updating the state in the states table - it works, but it very cumbersome! I have also resorted to putting large tables in their own stream so that I can refresh just that table if needed. Again, very cumbersome!

Additional context

None

Are you willing to submit a PR?

Unfortunately I don't have enough experience with Java or Python

nataliekwong commented 1 year ago

For reference, this is offered as a feature in alternative vendors (link)

nataliekwong commented 10 months ago

Sharing a related issue here: https://github.com/airbytehq/airbyte/issues/27354

nataliekwong commented 9 months ago

Adding this issue as a related request: https://github.com/airbytehq/airbyte-internal-issues/issues/2300. Today, we could theoretically kick off a sync after a single-stream reset, but this would then sync all the tables enabled for the connection. We don't want that.

The ideal behavior after a reset of a single stream should be that a sync is immediately kicked off to follow after the successful reset. The sync should be a sync of JUST the table/stream that was reset, not the other tables.

Same should follow if a reset is triggered for a partial set of enabled tables. If a user approves non-breaking schema changes through the Replication tab, clicking "Save changes" with the Reset checked will trigger a sync of all tables in the connection. It should only trigger a sync for the tables that were affected by the schema changes detected & approved.

cc @davinchia @benmoriceau

caileyfitzgerald commented 2 weeks ago

Any update on this?

nataliekwong commented 1 week ago

Hi there, we do not have this currently on our short-term roadmap.

You can Refresh a single stream to pull all historical records just for that stream, but it will still initiate a normal sync for other enabled streams in the connection.