airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.46k stars 3.99k forks source link

🐛 Source shopify: v0.1.28 breaking config change #9994

Closed dsnam closed 2 years ago

dsnam commented 2 years ago
## Environment - **Airbyte version**: 0.35.15-alpha - **OS Version / Instance**: ubuntu on aws ec2 - **Deployment**: docker - **Source Connector and version**: config created on shopify 0.1.22, upgraded to 0.1.30 but id'd breaking change in 0.1.28 - **Severity**: High - **Step where error happened**: Sync ## Current Behavior Sync for a previously valid connection fails immediately with `2022-02-02 03:43:47 source > Config validation error: 'credentials' is a required property` I believe you can see the breaking change here: https://github.com/airbytehq/airbyte/pull/9591/files I downgraded to 0.1.27 and was able to sync again, so it does at least leave the now-incorrect config intact. ## Expected Behavior Either migrate existing config or warn me very loudly with some confirm update dialog when I will need to manually intervene. ## Steps to Reproduce 1. have a shopify connection with source version < 0.1.28 2. upgrade to 0.1.28 or later 3. try to sync, should fail with above error due to shape/field name change in how the auth info is stored ## Are you willing to submit a PR? no
dsnam commented 2 years ago

Wanted to also note that there was a breaking change to the name of the refund stream in 0.1.26 that made existing sync catalogs invalid, so we are going to have to downgrade to lower than that to keep our sync running until we can adjust it. Looks like that was acknowledged in the PR discussion but I can't tell if anything was done about it.

Link to that PR: https://github.com/airbytehq/airbyte/pull/8597

alafanechere commented 2 years ago

@misteryeo do we have a flow-on how to manage the config breaking changes? @bazarnov do you think a fix can mitigate this problem?

bazarnov commented 2 years ago

Wanted to also note that there was a breaking change to the name of the refund stream in 0.1.26 that made existing sync catalogs invalid, so we are going to have to downgrade to lower than that to keep our sync running until we can adjust it. Looks like that was acknowledged in the PR discussion but I can't tell if anything was done about it.

Link to that PR: airbytehq/airbyte#8597

This change was necessary due to base-normalisation issues on certain destinations. There is no opportunity to revert those, we should keep this as it is now.

bazarnov commented 2 years ago

@misteryeo do we have a flow-on how to manage the config breaking changes? @bazarnov do you think a fix can mitigate this problem?

The specification changes were implemented to be able to use the OAuth2.0 authentication method for Airbyte-Cloud, up to standard factory implementation for the server side. This changes are permanent and should not be changed in the nearest future. Yes, we can revert the changes, but it will also touch some other parts of the OAuth2.0 implementation on the server. I believe the best way here is to update the connector version, make the sync all over again (1 time action) and continue with the future updates without the issues. @dsnam Is it suitable for you?

dsnam commented 2 years ago

I'm making noise about this so that situations like this are handled better, not so that you revert the commits. IMO Airbyte should do one or several of the following:

  1. Migrate config automatically. I understand that this ranges from trivial to impossible and would be a huge ask in some cases, so not counting on it.
  2. Add some warning that displays when trying to change connector versions so that users are immediately notified in-app about config incompatibilities rather than only finding out when a sync runs and fails
  3. At a bare minimum, mention breaking changes in the changelog. I had to dig through the PRs for this connector for versions between ours and the newest to figure out what version would be safe to update to, was not a great experience.

As far as handling our situation, are you suggesting remaking the connection entirely? Unless there's some way to transfer state won't that result in a full initial sync? My plan was to try to update the source creds config and connection sync catalog using the API. We actually manage when syncs happen externally and trigger them via the API rather than using Airbyte's scheduling, so this would let us keep the same connection id.

alafanechere commented 2 years ago

As far as handling our situation, are you suggesting remaking the connection entirely? Unless there's some way to transfer state won't that result in a full initial sync? My plan was to try to update the source creds config and connection sync catalog using the API. We actually manage when syncs happen externally and trigger them via the API rather than using Airbyte's scheduling, so this would let us keep the same connection id.

You can indeed give a try to an update through the API. About the state transfer, this is feasible but requires manual edit of your new connection to copy the state of your legacy one. This is can be achieved with some SQL on Airbyte's database.

misteryeo commented 2 years ago

Thanks @dsnam, just jumping in here to apologize for the subpar experience and really appreciate your feedback and suggestions. We'll be digging into this so we can prevent this in the future.

dsnam commented 2 years ago

@alafanechere updating it over the API only partially worked. there is a field in the sync catalog config called aliasName that I was hoping might work to preserve the old raw destination table name for refunds (orders_refunds) , but it appears that this field is not used. For reference, here is the description in the api docs:

aliasName: string Alias name to the stream to be used in the destination

When I get the connection after running my update the alias is reset to order_refunds. This means fetching the refunds data succeeds now but it is put into a _airbyte_raw_order_refunds table instead of the _airbyte_raw_orders_refunds table that our transforms were written to expect.

Is it intended that that aliasName field does nothing? It would be nice to not also have to go update our dbt transforms when something like this happens, especially since this would force me to either cope with refunds data existing in two tables or reset the connection and pull quite a lot of data all over again just to get it all in the new one. We happen to be pulling a small enough subset of the shopify streams that the name collision issue that motivated the stream name change doesn't affect us at all anyway, so if there's a way to keep the same destination table name that would be nice.

misteryeo commented 2 years ago

FYI @dsnam, we're going to address the high level issue of this through https://github.com/airbytehq/airbyte-internal-issues/issues/398.

@bazarnov lets revert the changes and handle the other aspects of the OAuth 2.0 implementation on the server that will be impacted. The thinking here is that even if @dsnam is able to resolve his specific issue, this backwards incompatible change will likely reach other users at some point so we should handle it internally. Thank you!

cc: @sherifnada for viz

dsnam commented 2 years ago

@bazarnov @misteryeo That PR fixes the auth config shape issue but the refunds stream name change is the issue that is trickier to migrate around due to downstream things like custom dbt transforms depending on the table names that are produced from the stream name. I know that one's not desirable to revert so is there any chance the aliasName field I mentioned above could be made to work like the api docs describe it? That would still result in old connections being broken initially but you could point users at the option of either starting over or modifying the config to alias to the old table name. We'd be completely fine with the fact that changing that is only accessible over the API if that'd make it any easier.