Tune PG connection pool to serve initial syncs

electric-sql / electric

Sync little subsets of your Postgres data into local apps and services.

https://electric-sql.com

Apache License 2.0

6.12k stars 143 forks source link

Tune PG connection pool to serve initial syncs #1499

Open balegas opened 1 month ago

balegas commented 1 month ago

This picture shows the max number of concurrent initial syncs per shape size. The takeaway is that when shape is large enough, we can't handle the next request because there are no more connections available in the Pg connection pool, because they are all busy.

We're going to tune the initial pool size to be better fit with the number of requests the server can handle.

balegas commented 1 month ago

@icehaunter can you leave here the same picture as above with the new default pool size configuration? Let's measure the improvement to decide on further action

icehaunter commented 1 month ago

This is (incomplete) run after #1503

balegas commented 1 month ago

It still falls over relatively soon, is it now falling over because of the snapshot timeout?

icehaunter commented 1 month ago

No, actually, it seems like it's another pool-related config option. The actual error is

Postgrex.Protocol (#PID<0.2568.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.2664.0> timed out because it queued and checked out the connection for longer than 15000ms

I'm looking into how to tweak this

alco commented 1 month ago

@icehaunter That's not pool-related, it's a query running for longer than the default query timeout.

magnetised commented 1 month ago

@icehaunter I encountered something similar. I think the following code fixed it:

def repeatable_read_transaction(fun) do
    transaction(
      fn ->
        query!("SET TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY", [])
        query!("SET LOCAL statement_timeout = 0")
        fun.()
      end,
      timeout: :infinity,
      ownership_timeout: :infinity
    )
  end

so adding ownership_timeout: :infinity to the transaction opts and then, just in case, SET LOCAL statement_timeout = 0 within the transaction.

TBH I found the ref to ownership_timeout in some random support thread but failed to trace its use anywhere in the ecto, postgrex or db_connection sources, so it may be that I hit a heisenbug and then attributed the solution to the above change. Worth a try though. The 15_000 ms timeout is pool related I think, as pg defaults to an infinite statement timeout A value of zero (the default) disables the timeout.

alco commented 1 month ago

@magnetised

ownership_timeout is specific to the DBConnection.Ownership pool which one can use in place of the default DBConnection.ConnectionPool.

timeout is an option you pass to Postgrex.query(). It is set to 15000 by default. The reason the error message in the log originates in DBConnection is because Postgrex forwards the timeout option to DBConnection which then starts a timer that determines how long a given query can be holding a connection for.