airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.39k stars 3.97k forks source link

[source-postgres] connection to Snowflake sync hangs #41942

Closed hongbo-miao closed 1 month ago

hongbo-miao commented 1 month ago

Version

What step the error happened?

During the sync

Relevant information

My other Postgres -> Snowflake works well.

I am trying to set up a "Postgres -> Snowflake" connection, here are some differences with other "Postgres -> Snowflake" connection:

When I sync, the destination Snowflake connector always hangs there (only one time succeed, please read next section).

Relevant log output

Case 1

I have one schema, some of tables have very a little data, the first time sync succeed, the rest of sync all hangs for many hours, have to cancel manually.

image

Some things I observed, seems this pull request helped a little bit, because I can see this in the "Log 1.1":

Skipping writing to storage since there are no bytes to write

All other sync that hung does not include this sentence.

Case 2

Another Postgres database, having same schema to case 1, only 1 table has two rows of data, it never succeed syncing even once. Always hangs hours until I manually cancelled.

I can see the data immediately got copied to in Snowflake airbyte_internal tables, but never moves to final Snowflake schema tables.

image

I tried

and then re-sync, but still hangs.

None of Case 2 logs include

Skipping writing to storage since there are no bytes to write

hongbo-miao commented 1 month ago

More observations, for Case 1, I found if hangs there, and suddenly some data got written during processing. It helps finish processing immediately.

So now our workaround solution is we add _airbyte_heartbeat in Postgres publication as well and sync this table to final Snowflake.

create table if not exists public._airbyte_heartbeat (
     id serial primary key,
     timestamp timestamptz not null default now()
);
grant insert, update on table public._airbyte_heartbeat to airbyte_user;
create publication airbyte_public_publication for table
    public._airbyte_heartbeat,
    public.aaa,
    public.bbb;

(Well, another issue shows for this _airbyte_heartbeat table, I posted at https://github.com/airbytehq/airbyte/discussions/41966)

However, I still think this is workaround solution, as we have some tables in a totally different Postgres

But it works well all time:

image

evantahler commented 1 month ago

Can you please try the latest postgres version, which is 3.6.10

hongbo-miao commented 1 month ago

Thanks @evantahler as we start to sync _airbyte_heartbeat table as well, which helps solve this issue as workaround way. And actually I think it is quite good, as we can always know the heartbeat is working properly without checking source Postgres. And this Postgres starts to have regular data writing in, so hard to reproduce again. I will close for now. If having same issue again, will follow up in future. And we upgraded to 3.6.10 as well. Thank you so much! 😃