airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.25k stars 4.15k forks source link

[source-mysql] to databricks meet OutOfMemoryError frequently #46932

Open amelia-ay opened 1 month ago

amelia-ay commented 1 month ago

Connector Name

source-mysql

Connector Version

3.7.3

What step the error happened?

During the sync

Relevant information

related to https://github.com/airbytehq/airbyte/issues/45871

we change MAX_CHUNK_SIZE of source-mysql like psdb.cloud , and it solved.

Relevant log output

No response

Contribute

marcosmarxm commented 3 weeks ago

Hello @amelia-ay

amelia-ay commented 3 weeks ago

Hello @amelia-ay

hi @marcosmarxm ,
So far, we have added a limit of the max chuck size from the source MySQL, and added the close connection for Databricks. And recently, we have added the times of merging the staging to the final table to a possible blockage from the final big merge. Then, how to release the cache of the Databricks pod in a timely manner remains a headache problem.

marcosmarxm commented 3 weeks ago

What value of chunk size are you using today?

theyueli commented 1 week ago

this is actually a destination issue.. tagging @edgao

edgao commented 1 week ago

cc @davinchia , unassigning myself

amelia-ay commented 1 week ago

What value of chunk size are you using today?

QUERY_TARGET_SIZE_GB = 3_145_728; MAX_CHUNK_SIZE = 1_000_000;