dimitri / pgcopydb

Copy a Postgres database to a target Postgres server (pg_dump | pg_restore on steroids)
Other
1.19k stars 78 forks source link

Fix deadlock during pipeline sync #880

Closed arajkumar closed 2 months ago

arajkumar commented 2 months ago

Before the fix, pgsql_sync_pipeline implementation calls PQpipelineSync followed by single PQconsumeInput and reads the results using PQgetResult until we get PGRES_PIPELINE_SYNC. However, PQgetResult would block when there is not enough data already consumed according to[1],

Note that PQgetResult will block only if a command is active and the necessary response data has not yet been read by PQconsumeInput.

The default read buffer for libpq connection is 16K. When the result size exceeds 16K[2], the PQgetResult would block forever unless we consume for input using PQconsumeInput.

This commit attempts to fix the problem by reintroducing the socket read readiness, but also uses PQisBusy to decide whether to consume input using PQconsumeInput.

The new implementation consumes the data in a loop until it receives the PGRES_PIPELINE_SYNC on the condition PQisBusy(conn) == 1 or the connection's socket read readiness. When the command is not busy i.e. PQisBusy(conn) == 0, the results will be read. The changes are inspired by the libpq pipeline integration test [3].

[1] https://www.postgresql.org/docs/current/libpq-async.html#LIBPQ-PQGETRESULT [2] https://github.com/postgres/postgres/blob/a68159ff2b32f290b1136e2940470d50b8491301/src/interfaces/libpq/fe-connect.c#L4616 [3] https://github.com/postgres/postgres/blob/a68159ff2b32f290b1136e2940470d50b8491301/src/test/modules/libpq_pipeline/libpq_pipeline.c#L1967-L2023

Signed-off-by: Arunprasad Rajkumar ar.arunprasad@gmail.com

dimitri commented 2 months ago

Thanks @arajkumar !