airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.34k stars 4.15k forks source link

Source Snowflake - JDBC timeout #21768

Open vsayer opened 1 year ago

vsayer commented 1 year ago

Environment

Current Behavior

The initial sync does not complete. The sync runs for ~7h30m and then decides to throw an timeout error.

at io.airbyte.integrations.source.snowflake.SnowflakeSource.main(SnowflakeSource.java:47) Caused by: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Timeout waiting for the download of #chunk164(Total chunks: 1147) retry=1.

This appears to be a problem because the initial sync volume is really large.

To make sure it was not related to the local EC2 instance it was running, the same thing was tried on Airbyte Cloud. The same issue occurred which appears to confirm it's a issue with the Snowflake source.

Expected Behavior

The initial sync should complete.

Logs

The full log file was already shared on the Airbyte slack channel airbyte-help. It's 102MB so do not want to share here due to the size of the log.

Steps to Reproduce

  1. Set up a snowflake source.
  2. Set up a S3 destination.
  3. Set up an sync and ensure the table size is really large.
  4. The initial sync should fail.

Are you willing to submit a PR?

Sure! If I can be pointed in the right direction!

andreu7d commented 1 year ago

We're facing the same issue, connecting from snowflake to postgress. If anyone is willing to help out to point us in the right direction, we might be able to take a look at it.

frankiebromage1 commented 1 week ago

@andreu7d @vsayer how did you get round this issue in the end? Having the same issue with snowflake to milvus