ged / ruby-pg

A PostgreSQL client library for Ruby
Other
796 stars 180 forks source link

Renew address resolution in conn.reset #559

Closed larskanis closed 8 months ago

larskanis commented 8 months ago

libpq resolves the host address while PQreset, but ruby-pg doesn't. This is because we explicit set the hostaddr connection parameter when the connection is established the first time. This prevents a newly DNS resolution when running PQresetStart.

This patch adds DNS resolution to conn.reset Since we can not change the connection parameters after connection start, the underlying PGconn pointer is exchanged in reset_start2. This is done by a PQfinish() + PQconnectStart() sequence. That way the hostaddr parameter is updated and a new connection is established with it.

There is a /etc/hosts and sudo based test in the specs. The behavior of libpq is slightly different to that of ruby-pg. It can be verified by the following code:

require "pg"

puts "pg version: #{PG::VERSION}"
# PG::Connection.async_api = false  # Enable to see libpq behaviour

system "sudo sed -i 's/.* abcd/::1 abcd/g' /etc/hosts"
conn = PG.connect host: "abcd", password: "l"
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

system "sudo sed -i 's/.* abcd/127.0.0.1 abcd/g' /etc/hosts"
conn.reset
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

system "sudo sed -i 's/.* abcd/::2 abcd/g' /etc/hosts"
conn.reset
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

This gives the following output showing, that the IP address is updated:

pg version: 1.5.5
{:host=>"abcd", :hostaddr=>"::1", :port=>"5432"}
{:host=>"abcd", :hostaddr=>"127.0.0.1", :port=>"5432"}
ruby-pg/lib/pg/connection.rb:573:in `reset_start2': connection to server at "::2", port 5432 failed: Network is unreachable (PG::ConnectionBad)
    Is the server running on that host and accepting TCP/IP connections?

Whereas libpq resolves similarly with async_api=false, but doesn't raise the error in conn.reset but in the subsequent conn.exec.

pg version: 1.5.5
{:host=>"abcd", :hostaddr=>nil, :port=>"5432"}
{:host=>"abcd", :hostaddr=>nil, :port=>"5432"}
test-reset-dns.rb:18:in `sync_exec': no connection to the server (PG::UnableToSend)

Fixes #558

bnferguson commented 8 months ago

Hello! Very excited to see this PR! I had previously been fixing this issue via a patch to Rails to do a full disconnect and reconnect (instead of a reset) but this is so much nicer as there were some concerns around not doing a reset.

My test setup is a Rails app with Sidekiq workers attaching to a postgres 14 primary with a read replica. They're behind a single hostname via dnsmasq. I start the failover by setting the primary to read only via ALTER SYSTEM SET default_transaction_read_only = on; and then promote the secondary via SELECT pg_promote();.

I then switch the hostname entries in dnsmasq and disconnect all db connections via SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid();

The previous behavior had Rails and Sidekiq workers immediately reconnecting to the old primary and then throwing errors around being readonly.

With pg set to this branch we see that Rails and Sidekiq no longer reconnect to the old db and get the DNS change. This is verified with select * from pg_stat_activity;

Seems very successful! 🎉 🙌