Closed adibsaad closed 2 years ago
@adibsaad thanks for reporting this! that's not expected. I'll see if I can figure out what's happening here.
Briefly poking at it, I expected there to be a problem with the Notifier holding open file descriptors, but it also appears that execution threads are as well. This is unexpected because the execution thread database connections are wholly managed by Rails' execution wrapper, which afaik are supposed to manage connection lifecycle.
Here is what I'm seeing running lsof -p [PID]
with good_job
executable (I'd expect similar when running async too):
ruby 50923 bensheldon 12u IPv6 0x935f30b3386e4bab 0t0 TCP localhost:53047->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 13u IPv6 0x935f30b333d2596b 0t0 TCP localhost:53048->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 14u IPv6 0x935f30b333d2294b 0t0 TCP localhost:53049->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 15u IPv6 0x935f30b333d2302b 0t0 TCP localhost:53051->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 16u IPv6 0x935f30b333d2370b 0t0 TCP localhost:53052->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 17u IPv6 0x935f30b333d23deb 0t0 TCP localhost:53053->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 18u IPv6 0x935f30b3386c604b 0t0 TCP localhost:53054->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 19u IPv6 0x935f30b3386c3deb 0t0 TCP localhost:53055->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 20u IPv6 0x935f30b3386c294b 0t0 TCP localhost:53056->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 21u IPv6 0x935f30b3386c302b 0t0 TCP localhost:53057->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 22u IPv6 0x935f30b333d2b02b 0t0 TCP localhost:53059->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 23u IPv6 0x935f30b333d2b70b 0t0 TCP localhost:53060->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 24u IPv6 0x935f30b333d2bdeb 0t0 TCP localhost:53061->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 25u IPv6 0x935f30b333d2cbab 0t0 TCP localhost:53066->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 26u IPv6 0x935f30b333d2d28b 0t0 TCP localhost:53067->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 27u IPv6 0x935f30b333d2e04b 0t0 TCP localhost:53069->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 28u IPv6 0x935f30b333d2a94b 0t0 TCP localhost:53070->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 29u IPv6 0x935f30b3386dcbab 0t0 TCP localhost:53071->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 30u IPv6 0x935f30b3386dd96b 0t0 TCP localhost:53072->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 31u IPv6 0x935f30b3386da94b 0t0 TCP localhost:53073->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 32u IPv6 0x935f30b3386d84cb 0t0 TCP localhost:53074->localhost:postgresql (CLOSED)
ruby 50923 bensheldon 33u IPv6 0x935f30b3386c128b 0t0 TCP localhost:53079->localhost:postgresql (CLOSED)
I'm a bit stumped because when looking at a normal rails s
and trying to hit the webserver, I'm seeing the same Postgres error and similar stacktrace but it's not leaving behind a file descriptor.
Doing some more research and experimentation. It looks like the file descriptors are cleaned up when the socket objects are garbage collected:
❯ bin/rails c
Loading development environment (Rails 6.1.4.6)
[1] pry(main)> ObjectSpace.each_object(PG::Connection).count
=> 0
[2] pry(main)> ObjectSpace.each_object(BasicSocket).count
=> 0
[3] pry(main)> 10.times { ActiveRecord::Base.connection rescue nil }
=> 10
[4] pry(main)> ObjectSpace.each_object(PG::Connection).count
=> 10
[5] pry(main)> ObjectSpace.each_object(BasicSocket).count
=> 10
[6] pry(main)> GC.start
=> nil
[7] pry(main)> ObjectSpace.each_object(PG::Connection).count
=> 0
[8] pry(main)> ObjectSpace.each_object(BasicSocket).count
=> 0
I can't find an obvious way to clean up the file descriptors without manually triggering GC. When debugging this, I was frequently interrupted by the GC running (even in the middle of a binding.pry
). There's a very hacky way to release the file descriptors, but it relies on deprecated methods and seems very OS/implementation specific:
[1] pry(main)> ActiveRecord::Base.connection
ActiveRecord::ConnectionNotEstablished: connection to server at "::1", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
from /Users/ben@codeforamerica.org/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/activerecord-6.1.4.6/lib/active_record/connection_adapters/postgresql_adapter.rb:83:in `rescue in new_client'
Caused by PG::ConnectionBad: connection to server at "::1", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
from /Users/ben@codeforamerica.org/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/pg-1.3.1/lib/pg/connection.rb:637:in `async_connect_or_reset'
[2] pry(main)> conn = ObjectSpace.each_object(PG::Connection).first
=> #<PG::Connection:0x000000013791bdc8>
# The `#socket` method is deprecated
[3] pry(main)> fd = conn.socket
=> 13
[4] pry(main)> io = IO.for_fd(fd)
=> #<IO:fd 13>
[5] pry(main)> io.close
=> nil
[6] pry(main)> io
=> #<IO:(closed)>
@adibsaad are you doing anything custom with the Ruby Garbage Collector? e.g. GC.disable
or changing any of the GC constants?
Nope, not doing any GC magic. Forgot to mention, I'm on OSX Monterey.
It looks like the root cause here is pg
. When pg fails to connect, it does not clean up the resulting connection. I will open a ticket upstream.
Fix was released in pg v1.3.4.
Problem
My guess is it's not closing the fail descriptor after failing to reconnect to the db.
Stack trace