ged / ruby-pg

A PostgreSQL client library for Ruby
Other
796 stars 180 forks source link

Sidekiq processes are stuck (frozen) regularly (they stop to process jobs) when trying to connect to PostgreSQL #584

Open sigmaray opened 2 months ago

sigmaray commented 2 months ago

Hi

In out project Sidekiq freezes regularly, and we have to restart Sidekiq processes to make them work again. I wrote healthcheck mechanism to investigate this problem and got some interesting logs: https://gist.github.com/sigmaray/b9b4e974b1da614625652eb166848392 (healthchek + sidekiq stack traces)

I don't have expertise in analyzing sidekiq stack traces, so my assumption can be wrong. But it seems that Sidekiq processes are stuck when trying to connect to PostgreSQL

We are using PostgreSQL from Yandex Cloud and all connection to PostgreSQL seems to go through Odyssey pooler.

We have such configuration in database.yml:

default: &default
  ...
  pool: 30 # equals to :concurrency in sidekiq.yml
  timeout: 10000 # 10s in milliseconds

Having database timeout in database.yml doesn't help us to prevent sidekiq freezes. We had increased connection limit in managed PostgreSQL and that also didn't help.

I found similar issue, but it is very old and it's marked as fixed: https://github.com/ged/ruby-pg/issues/245

Does anyone know why this happens and how to avoid Sidekiq freezes?

Thanks

# ruby -v
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
# Gemfile.lock
rails (7.0.3.1)
pg (1.5.6)
 production > ActiveRecord::Base.connection.execute('SELECT version()').as_json
[
    [0] {
        "version" => "PostgreSQL 12.18 (Ubuntu 12.censored) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit"
    }
]
sigmaray commented 2 months ago

I forgot to mention. If I understood correctly, Sidekiq processes freeze on this line

19:39:59 jobs-2.1        | 2024-08-26T16:39:59.228Z pid=14 tid=4ge WARN:
[...]/activerecord-7.0.3.1/lib/active_record/connection_adapters/postgresql_adapter.rb:862:in `set_client_encoding'

It seems that set_client_encoding method doesn't respect timeot specified in database.yml

My guess could be wrong. I don't have experience of reading sidekiq traces.

larskanis commented 2 months ago

There was a previous issue regarding to encoding through a connection proxy: #368

It is resolved since pg-1.5.4. But looking into the ActiveRecord sources, it looks like you're actively triggering this command by setting :encoding in your database config. Can you disable this config option?

drdrsh commented 2 months ago

I ran into a similar issue where timeouts/interrupts are sometimes not honored by ruby-pg. I was never able to reproduce the issue in a controlled environment but it was happening in production.

The fix was to set keepalives and tcp_user_timeout on the sockets.

postgres://username:password@host:port/database?connect_timeout=2&keepalives=1&keepalives_idle=1&keepalives_count=1&tcp_user_timeout=1000

I don't know if this is related in anyway but using these settings we were able to eliminate the frozen thread occurrences