Open zipakgow opened 3 years ago
can you determine why Postgres returns 25006 ?
It’s returning 25006 as primary is now only in read mode. connection pool isn’t purging old connections (created for primary server), even though at the db secondary is now set as Active.
Primary server: now is in standby (read only mode) Secondary server: is the new active server (read write mode)
25006 -> https://www.postgresql.org/docs/11/errcodes-appendix.html
I'm assuming that creating a new socket connection to the same socket address will resolve to the new server ?
yep, thats right.
do you know if there is way to reproduce it with test container ?
do you also know if other traditional solutions (ie jdbc + pool) do something specific for this ?
do you also know if in such case when a database become a primary then Postgres will emit a notification that could be used by the client to evict the connection from the pool before failing ?
do you know if there is way to reproduce it with test container ?
we can, but it would need a VIP (F5- it has option to monitor for active server) to be setup
do you also know if other traditional solutions (ie jdbc + pool) do something specific for this ?
we are very new to this setup, so i posted the question on postgres slack channel, with limited help.
do you also know if in such case when a database become a primary then Postgres will emit a notification that could be used by the client to evict the connection from the pool before failing ?
per the EDM EFM switchover documentation, the primary server does restart during this switchover,. but no other notification is sent when it happens.
do you know whether we could fake this somehow ? e.g pass the database in read only mode with a specific command, etc...
it seems possible (https://dba.stackexchange.com/questions/271685/postgresql-9-3-set-database-or-all-to-read-only-mode), I'm thinking we could use this to check we get an error and then that the connection is removed from the pool.
sure, as a workaround for now, we are trying to purge the pool and re-initialize it.
I checked what HikariCP (another popular connection pool implementation) is doing, and it seems they have a few strategies for evicting connections from the pool:
conn.isValid()
with JDBC 4 or a connection test query), and if it fails, it will close the connection and try acquiring another one from the pool repeatedly up until a timeout is reached: https://github.com/brettwooldridge/HikariCP/blob/3a90fb97b9293841ff6f6d2b16b17545a48c513c/src/main/java/com/zaxxer/hikari/pool/HikariPool.java#L168-L171SQLException
is thrown, certain types of exception result in the connection being evicted from the pool: https://github.com/brettwooldridge/HikariCP/blob/b5f5700e2dfdb23be9c7d01722df26eff134b1ef/src/main/java/com/zaxxer/hikari/pool/ProxyConnection.java#L154-L190 It looks like this particular error may not currently be part of this eviction logic: https://github.com/brettwooldridge/HikariCP/issues/1960Would a similar approach be possible in this library as well? If there were a setup which could reliably reproduce this sort of issue (e.g. as suggested here: https://github.com/eclipse-vertx/vertx-sql-client/issues/981#issuecomment-862178252), it might be easier to understand how existing connection pools handle this particular situation and whether a similar strategy could apply here.
What would be the best way to handle this? I am hitting the same issue when a fail over happens is the only way currently to create a new instance of the pool?
The nicest way would be to remove the connection on 25006
errors but I don't see an available interface to handle this currently.
Questions
we have F5 VIP for our postgres db, with gtm_monitor set which in realtime points to either the primary or secondary or DR server based on whichever is active. using the PgPool seems to hold on to the connections to the previously active server and does not discard it. postgres seems to be specifically returning an error code: 25006 exception message returned to app server (when connections from previous active servers are used): cannot execute INSERT in a read-only transaction error code: 25006
can there be a method to discard connections from the pool when such error occurs? or this be handled internally?
Version
Which version(s) did you encounter this bug ? vertx-pg-client: 3.9.7
Do you have a reproducer?
No
Steps to reproduce
Note: application connects to F5 VIP created for postgres