airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.07k stars 4.11k forks source link

Source MySQL - upgrade from 0.6.12 to 1.0.1 creates "Numeric value 'true' is not recognized in the SCD normalization process" errors #17510

Closed marcosmarxm closed 2 years ago

marcosmarxm commented 2 years ago

This Github issue is synchronized with Zendesk:

Ticket ID: #2519 Priority: high Group: Community Assistance Engineer Organization: Airbyte Assignee: Sajarin

Original ticket description:

I’ve just upgraded the mysql source from 0.6.12 to 1.0.1 (GA i believe) and running into production syncing issues now

Numeric value 'true' is not recognized in the SCD normalization process. Created issue here: https://discuss.airbyte.io/t/mysql-1-0-1-ga-source-normalization-failure/2773

However, the most pressing issue is that we can’t revert back to 0.6.12 without still experiencing the same issue across all connectors
[via Slack] Open original slack thread in #public-airbyte-help channel: https://airbytehq-team.slack.com/archives/C021JANJ6TY/p1664768105368939

sajarin commented 2 years ago

@grishick can we move this up the backlog queue?

marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-10-03 at 16:51:

[via Slack] Hey @user_id, I created an issue on our Github with this ticket: https://github.com/airbytehq/airbyte/issues/17510

The MySQL connector is not in GA, it’s just about to be in beta.

marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-10-03 at 20:07:

[via Slack] i think this one is actually an issue with the destination - snowflake

grishick commented 2 years ago

I suspect that the underlying problem is that MySQL stores boolean as TINYINT(1) and somewhere between 0.6.1 and 1.0.1 we changed how MySQL source returns booleans. The simplest way to fix this is to run a reset. cc @edgao could we handle this gracefully in destinations without a reset? I.e., can we run alter table on scd and destination tables?

edgao commented 2 years ago

is the theory that:

(that sounds reasonable to me, maybe worth tagging someone from db-sources to confirm)

quick research:

If alter table works, then this is easy: user should refresh schema and choose not to reset; their next sync will automatically run the ALTER TABLE and everything will be fine.

If alter table does not works, then it's a little more complicated. Assuming they still have their _raw tables:

  1. User needs to refresh their schema + choose not to reset
  2. User should manually delete the SCD and final tables
  3. Next sync will still be incremental (since we didn't reset the connection)
  4. Next sync's normalization run will effectively do a soft reset (https://github.com/airbytehq/airbyte-internal-issues/issues/945) and recreate all the normalized tables
grishick commented 2 years ago

I also just double checked that current version of MySQL Source treats TINYINT(1) columns as boolean

marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-10-05 at 19:48:

[via Slack] Hi! do you have any update regarding this issue, i am also getting Cannot drop column '_AIRBYTE_UNIQUE_KEY

edgao commented 2 years ago

for continuity of discussion: asked user here (https://github.com/airbytehq/airbyte/pull/14424#issuecomment-1266351233) if they upgraded platform version at some point, to try and rule out a normalization bug

marcosmarxm commented 2 years ago

Zendesk ticket #2744 has been linked to this issue.

marcosmarxm commented 2 years ago

Comment made from Zendesk by Nataly Merezhuk on 2022-10-12 at 13:51:

First a few questions:

1. What is the field type for the value causing the error, is it Boolean or TINYINT?

2. What version of the MySQL connector were you using before this that did not cause errors?

And some information:

1. There is a GitHub issue for this here, it is not resolved yet:
https://github.com/airbytehq/airbyte/issues/17510

2. We changed how the MySQL source connector returns booleans some time between 0.6.1 and 1.0.1.

3. A potential workaround would be to roll back the version to 0.6.15.

4. Another two workarounds are described here:
https://github.com/airbytehq/airbyte/issues/17510#issuecomment-1266107416

This is actively being worked on :) let me know if any of those 3 workarounds pan out for you!
marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-10-13 at 01:41:

  1. The field in MySQL source is tinyint(1) unsigned and in the staging view it’s
cast(MY_FIELD as 
    bigint
) as MY_FIELD
  1. i think the MySQL connector version was 0.6.12
[Discourse post]
danieldiamond commented 2 years ago

I edited the above comment: and in the RAW AIRBYTE table the JSON is "MY_FIELD": false

marcosmarxm commented 2 years ago

Comment made from Zendesk by Nataly Merezhuk on 2022-10-13 at 14:25:

Got it, thanks!

Could you try rolling back to a pre 1.x.x version, using one of the work arounds listed in the link below, or doing a total reset? The reset obviously would be a last resort option.

https://github.com/airbytehq/airbyte/issues/17510#issuecomment-1266107416
marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-10-13 at 22:39:

danieldiamond:

This is a fresh new deployed airbyte instance, this is the latest source and destination versions and the first sync for this table.

Is the suggestion to migrate to pre 1.X OR reset the connector? Because this is a brand new connector, never been synced. Resetting doesn’t resolve this issue

[Discourse post]

marcosmarxm commented 2 years ago

Comment made from Zendesk by Nataly Merezhuk on 2022-10-14 at 13:39:

Oh, I see I wasn't aware that there was already a discussion happening on GitHub! I see that one of our engineers is looking for more info for you here:
https://github.com/airbytehq/airbyte/pull/14424

Sorry for the confusion - in this case please try rolling back to 0.6.15 or below, that should be the temporary workaround. Let me know if that helps! Otherwise I'd say to keep an eye on that GitHub thread!
subodh1810 commented 2 years ago

~see https://github.com/airbytehq/airbyte/pull/14424#issuecomment-1293917212~ Raised PR https://github.com/airbytehq/airbyte/pull/18619, see PR description for more info

marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-10-30 at 21:48:

marcosmarxm commented 2 years ago

Comment made from Zendesk by Nataly Merezhuk on 2022-11-03 at 12:34:

Hey Daniel, it looks like the issue has been solved! Let me know if you need any further info!
marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-11-03 at 13:02:

Yes @natalyjazzviolin, subodh crushed it. He was extremely responsive, investigated the issue thoroughly and provided a timely and solid fix. Big fan, airbyte team lucky to have him.

[Discourse post]
marcosmarxm commented 2 years ago

Comment made from Zendesk by Nataly Merezhuk on 2022-11-03 at 13:33:

Wonderful to hear!