airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.19k stars 4.14k forks source link

Source MySQL: java.sql.SQLException: MONTH #18609

Open DRTcom opened 2 years ago

DRTcom commented 2 years ago

Environment

Current Behavior

Airbyte throws an error during a sync and terminates the sync when it encounters a field with invalid dates with an invalid month. Example invalid date is '2022-00-03'.

Note this error appears to be related to issue #16574 and #16159

Expected Behavior

Expecting to have invalid dates be nulled out while properly syncing valid dates.

Logs

a4aa250e_dd2f_4970_b578_82ae3935d63a_logs_3639_txt.txt

Steps to Reproduce

  1. Initial sync
  2. Airbyte encounters bad data
  3. Airbyte terminates sync with message: "Failure Origin: source, Message: Something went wrong in the connector. See the logs for more details." and no records are synced.

Also tried adding these params to JDBC string but still received the same error:

zeroDateTimeBehavior=CONVERT_TO_NULL zeroDateTimeBehavior=ROUND zeroDateTimeBehavior=convertToNull noDatetimeStringSync=true

Are you willing to submit a PR?

Yes but I would need some guidance

sh4sh commented 2 years ago

Thanks for reporting this, I've added it to the team's backlog. I did find an open issue around parsing YEAR, it would be great if you could add some thoughts to the ticket: https://github.com/airbytehq/airbyte/issues/8722

I don't think the fix for YEAR will work here because I'm not seeing any equivalent yearIsDateType=true mysql config, but they may still be related.

crigueros-stratio commented 1 year ago

alguien a encontrado alguna solucion a esto?

bleonard commented 1 year ago

Add some test(s) with bad month. or use existing ones. These values should be set to NULL. Remember to do for both CDC and non-CDC.

bleonard commented 1 year ago

Let's also make a test and handle the 32nd of january (bad day number).... or a non existent leap year (feb 29, 2001)

ayushsinghal90 commented 1 year ago

@DRTcom I was not able to replicate it via test cases,

Made changes here code ref and here code ref with CREATE TABLE id_and_name (id INTEGER, name VARCHAR(200), date date); INSERT INTO id_and_name (id, name, date) VALUES (1, 'picard', '2022-01-01'), (2, 'crusher', '2022-01-01'), (3, 'vash', '2022-00-01');

Test failed at insertion with ERROR: date/time field value out of range: "2022-00-01" Hint: Perhaps you need a different "datestyle" setting.

I tried adding faulty entry INSERT INTO id_and_name (id, name, date) VALUES (4, 'picard', '2022-00-01'); via psql command in docker there as well it failed with the same error.

can you help me with how to replicate (new here trying to contribute 🙂)

bleonard commented 1 year ago

Hi @ayushsinghal90, did you try with something non-zero? 2023-02-31 for example.

rodireich commented 1 year ago

This is likely a MariaDB related issue. While mysql (as well as mariadb) allow a date of all components zero 0000-00-00 which signifies "wrong date" - that value is actually converted to null no problem ←←← this is due to configuring zeroDateTimeBehavior=convertToNull in #6093 .

Only MariaDB allow some date components be zero e.g 2023-00-12, Which is where the problem is.

rodireich commented 1 year ago

We're using mysql's connector/j lib to connect with mariadb servers. This may be some edge case that requires native mariadb connector. I see that mariadb connector has its own implementation of zeroDateTimeBehavior. I will try to check with native mariadb java connector

rodireich commented 1 year ago

@prateekmukhedkar this is most likely an issue related to mariadb - not a mysql data accuracy per se. I moved to services lane

erica-airbyte commented 1 year ago

A cloud customer is hitting this issues. (case)

prateekmukhedkar commented 1 year ago

@rodireich should I move this back to Data Accuracy pipeline?

cbuckle1 commented 1 year ago

Any update on this? We've run into this issue against our mySQL sources.

michaelsonnle commented 1 year ago

We have the same issue now. Connector MySQL v3.1.6.

This appeared after a reset of the source table.

vitorfraga commented 7 months ago

Hello! Any update here? I have this problem too.