airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.88k stars 4.07k forks source link

[destination-mysql] - Table Creation Control and Data Type Mapping from Redshift to MySQL #38031

Open karthikreddi opened 5 months ago

karthikreddi commented 5 months ago

Connector Name

destination-mysql

Connector Version

1.0.0

What step the error happened?

During the sync

Relevant information

Issues:

  1. Can Airbyte allow users to create destination tables in MySQL and sync data, rather than having the Airbyte MySQL connector manage table structure creation or updates?

  2. When transferring data from Redshift to MySQL via Airbyte, how can we ensure that a datetime field in Redshift is mapped correctly as a datetime field in MySQL? Additionally, how are other data types in Redshift mapped to relevant data types in MySQL during this data transfer process?

Can anyone help with resolving these issues or point me to any existing functionality that addresses them? Below are the table structures.

Relevant Table Structures

Redshift Table Structure:

CREATE TABLE reports.sales_order_report ( salesteammemberid character varying(36) ENCODE lzo, sellerworkspaceid character varying(36) ENCODE lzo, state character varying(255) ENCODE lzo, orderdate timestamp without time zone ENCODE az64, nooforders integer ENCODE az64, totalvalue numeric(18, 0) ENCODE az64, updated_at timestamp without time zone ENCODE az64, created_at timestamp without time zone ENCODE az64 ) DISTSTYLE AUTO;

Airbyte Destination MySQL Table Structure

CREATE TABLE sales_order_report ( state text, orderdate text, created_at text, nooforders bigint DEFAULT NULL, totalvalue decimal(38,9) DEFAULT NULL, updated_at text, salesteammemberid text, sellerworkspaceid text, _airbyte_raw_id varchar(36) NOT NULL, _airbyte_extracted_at timestamp(6) NOT NULL, _airbyte_meta json NOT NULL, KEY dedup_idx (sellerworkspaceid(50),_airbyte_extracted_at,salesteammemberid(50),state(50),orderdate(50),updated_at(50)), KEY extracted_at_idx (_airbyte_extracted_at), KEY raw_id_idx (_airbyte_raw_id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Required Destination MySQL Structure: i require destination to be of below format

CREATE TABLE sales_order_report ( state varchar(255) DEFAULT NULL, orderdate text, created_at text, nooforders bigint DEFAULT NULL, totalvalue float(18,0) DEFAULT NULL, updated_at text, salesteammemberid varchar(36) DEFAULT NULL, sellerworkspaceid varchar(36) DEFAULT NULL, _airbyte_raw_id varchar(36) NOT NULL, _airbyte_extracted_at timestamp(6) NOT NULL, _airbyte_meta json NOT NULL, KEY dedup_idx (sellerworkspaceid(36),_airbyte_extracted_at,salesteammemberid(36),state(255),orderdate,updated_at), KEY extracted_at_idx (_airbyte_extracted_at), KEY raw_id_idx (_airbyte_raw_id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Contribute

marcosmarxm commented 5 months ago
  1. Can Airbyte allow users to create destination tables in MySQL and sync data, rather than having the Airbyte MySQL connector manage table structure creation or updates?

No, all tables are handle by the connector.

  1. When transferring data from Redshift to MySQL via Airbyte, how can we ensure that a datetime field in Redshift is mapped correctly as a datetime field in MySQL? Additionally, how are other data types in Redshift mapped to relevant data types in MySQL during this data transfer process?

The data type is done by the source in the discover process. You don't have the option to change them today.

karthikreddi commented 5 months ago

Hello @marcosmarxm,

Understood. We need the destination MySQL table to use varchar or string columns instead of the text datatype. Is there a way to accomplish this?

Whenever we manually update the table structure, Airbyte automatically performs a soft refresh during subsequent runs. Is there a workaround available for this situation at the moment?

mlavoie-sm360 commented 3 months ago

The data type is done by the source in the discover process. You don't have the option to change them today.

So would that mean that the discover process for the redshift-source is broken if it determines that a timestamp without time zone should be converted to a text/varchar at destination?