eclipse-vertx / vertx-sql-client

High performance reactive SQL Client written in Java
Apache License 2.0
892 stars 201 forks source link

postgres pgpool holds on to the previously active primary server connection #981

Open zipakgow opened 3 years ago

zipakgow commented 3 years ago

Questions

we have F5 VIP for our postgres db, with gtm_monitor set which in realtime points to either the primary or secondary or DR server based on whichever is active. using the PgPool seems to hold on to the connections to the previously active server and does not discard it. postgres seems to be specifically returning an error code: 25006 exception message returned to app server (when connections from previous active servers are used): cannot execute INSERT in a read-only transaction error code: 25006

can there be a method to discard connections from the pool when such error occurs? or this be handled internally?

Version

Which version(s) did you encounter this bug ? vertx-pg-client: 3.9.7

Do you have a reproducer?

No

Steps to reproduce

Note: application connects to F5 VIP created for postgres

  1. primary server is ACTIVE, secondary server is on STANDBY
  2. APP starts sending requests to db with connection pool size atleast 1
  3. EFM is used on postgres to do the switch over for DB maintainence. This sets secondary server as ACTIVE and primary server to STANDBY.
  4. App continues to send requests, and all the requests fail with error code: 25006 and message: cannot execute INSERT in a read-only transaction
vietj commented 3 years ago

can you determine why Postgres returns 25006 ?

zipakgow commented 3 years ago

It’s returning 25006 as primary is now only in read mode. connection pool isn’t purging old connections (created for primary server), even though at the db secondary is now set as Active.

Primary server: now is in standby (read only mode) Secondary server: is the new active server (read write mode)

25006 -> https://www.postgresql.org/docs/11/errcodes-appendix.html

vietj commented 3 years ago

I'm assuming that creating a new socket connection to the same socket address will resolve to the new server ?

zipakgow commented 3 years ago

yep, thats right.

vietj commented 3 years ago

do you know if there is way to reproduce it with test container ?

vietj commented 3 years ago

do you also know if other traditional solutions (ie jdbc + pool) do something specific for this ?

vietj commented 3 years ago

do you also know if in such case when a database become a primary then Postgres will emit a notification that could be used by the client to evict the connection from the pool before failing ?

zipakgow commented 3 years ago

do you know if there is way to reproduce it with test container ?

we can, but it would need a VIP (F5- it has option to monitor for active server) to be setup

do you also know if other traditional solutions (ie jdbc + pool) do something specific for this ?

we are very new to this setup, so i posted the question on postgres slack channel, with limited help.

do you also know if in such case when a database become a primary then Postgres will emit a notification that could be used by the client to evict the connection from the pool before failing ?

per the EDM EFM switchover documentation, the primary server does restart during this switchover,. but no other notification is sent when it happens.

vietj commented 3 years ago

do you know whether we could fake this somehow ? e.g pass the database in read only mode with a specific command, etc...

vietj commented 3 years ago

it seems possible (https://dba.stackexchange.com/questions/271685/postgresql-9-3-set-database-or-all-to-read-only-mode), I'm thinking we could use this to check we get an error and then that the connection is removed from the pool.

zipakgow commented 3 years ago

sure, as a workaround for now, we are trying to purge the pool and re-initialize it.

ianprime0509 commented 2 years ago

I checked what HikariCP (another popular connection pool implementation) is doing, and it seems they have a few strategies for evicting connections from the pool:

  1. When acquiring a connection from the pool, if a connection is found which hasn't been used in a while, it runs a connection test (either conn.isValid() with JDBC 4 or a connection test query), and if it fails, it will close the connection and try acquiring another one from the pool repeatedly up until a timeout is reached: https://github.com/brettwooldridge/HikariCP/blob/3a90fb97b9293841ff6f6d2b16b17545a48c513c/src/main/java/com/zaxxer/hikari/pool/HikariPool.java#L168-L171
  2. When an SQLException is thrown, certain types of exception result in the connection being evicted from the pool: https://github.com/brettwooldridge/HikariCP/blob/b5f5700e2dfdb23be9c7d01722df26eff134b1ef/src/main/java/com/zaxxer/hikari/pool/ProxyConnection.java#L154-L190 It looks like this particular error may not currently be part of this eviction logic: https://github.com/brettwooldridge/HikariCP/issues/1960

Would a similar approach be possible in this library as well? If there were a setup which could reliably reproduce this sort of issue (e.g. as suggested here: https://github.com/eclipse-vertx/vertx-sql-client/issues/981#issuecomment-862178252), it might be easier to understand how existing connection pools handle this particular situation and whether a similar strategy could apply here.

cdekok commented 4 months ago

What would be the best way to handle this? I am hitting the same issue when a fail over happens is the only way currently to create a new instance of the pool? The nicest way would be to remove the connection on 25006 errors but I don't see an available interface to handle this currently.