Closed larskanis closed 8 months ago
Hello! Very excited to see this PR! I had previously been fixing this issue via a patch to Rails to do a full disconnect and reconnect (instead of a reset
) but this is so much nicer as there were some concerns around not doing a reset
.
My test setup is a Rails app with Sidekiq workers attaching to a postgres 14 primary with a read replica. They're behind a single hostname via dnsmasq. I start the failover by setting the primary to read only via ALTER SYSTEM SET default_transaction_read_only = on;
and then promote the secondary via SELECT pg_promote();
.
I then switch the hostname entries in dnsmasq and disconnect all db connections via SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid();
The previous behavior had Rails and Sidekiq workers immediately reconnecting to the old primary and then throwing errors around being readonly.
With pg
set to this branch we see that Rails and Sidekiq no longer reconnect to the old db and get the DNS change. This is verified with select * from pg_stat_activity;
Seems very successful! 🎉 🙌
libpq resolves the host address while PQreset, but ruby-pg doesn't. This is because we explicit set the
hostaddr
connection parameter when the connection is established the first time. This prevents a newly DNS resolution when running PQresetStart.This patch adds DNS resolution to
conn.reset
Since we can not change the connection parameters after connection start, the underlying PGconn pointer is exchanged in reset_start2. This is done by a PQfinish() + PQconnectStart() sequence. That way thehostaddr
parameter is updated and a new connection is established with it.There is a
/etc/hosts
andsudo
based test in the specs. The behavior of libpq is slightly different to that of ruby-pg. It can be verified by the following code:This gives the following output showing, that the IP address is updated:
Whereas libpq resolves similarly with
async_api=false
, but doesn't raise the error inconn.reset
but in the subsequentconn.exec
.Fixes #558