Dynflow / dynflow

DYNamic workFLOW orchestration engine
http://dynflow.github.io
MIT License
121 stars 44 forks source link

Sidekiq dying on idle state #440

Open guillermomolina opened 9 months ago

guillermomolina commented 9 months ago

I am using Dynflow with Foreman in a Docker swarm environment. After many hours of running, when foreman is idle, I get a DatabaseDisconnectError exception. After that, Sidekiq seems to be running, new jobs are executed, but they won't finish. I have to restart both the orchestrator and the worker to go back to normal.

These are the logs:

foreman_orchestrator.1.iv56474z9xfg@ges-docker3    | #<Thread:0x0000561d2f4eb040 /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:471 run> terminated with exception (report_on_exception is true):
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    | /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `exec': PG::ConnectionBad: PQsocket() can't get socket descriptor (Sequel::DatabaseDisconnectError)
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `block in execute_query'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/logging.rb:38:in `log_connection_yield'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `execute_query'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `block in execute'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:136:in `check_disconnect_errors'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `execute'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `ensure in block (2 levels) in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `block (2 levels) in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/connecting.rb:293:in `synchronize'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:488:in `block in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:555:in `check_database_errors'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:487:in `listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/connectors/database.rb:25:in `block in start'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    | /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `exec': PQsocket() can't get socket descriptor (PG::ConnectionBad)
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `block in execute_query'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/logging.rb:38:in `log_connection_yield'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `execute_query'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `block in execute'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:136:in `check_disconnect_errors'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `execute'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `ensure in block (2 levels) in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `block (2 levels) in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/connecting.rb:293:in `synchronize'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:488:in `block in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:555:in `check_database_errors'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:487:in `listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/connectors/database.rb:25:in `block in start'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    | /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:508:in `wait_for_notify': PQconsumeInput() server closed the connection unexpectedly (PG::ConnectionBad)
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    This probably means the server terminated abnormally
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    before or while processing the request.
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:508:in `block (3 levels) in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:505:in `catch'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:505:in `block (2 levels) in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/connecting.rb:293:in `synchronize'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:488:in `block in listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:555:in `check_database_errors'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:487:in `listen'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/connectors/database.rb:25:in `block in start'
foreman_orchestrator.1.iv56474z9xfg@ges-docker3    |    from /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    | #<Thread:0x000055bab38aec60 /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:471 run> terminated with exception (report_on_exception is true):
foreman_worker.1.8xvblxhzdy7n@ges-docker4    | /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `exec': PG::ConnectionBad: PQsocket() can't get socket descriptor (Sequel::DatabaseDisconnectError)
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `block in execute_query'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/logging.rb:38:in `log_connection_yield'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `execute_query'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `block in execute'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:136:in `check_disconnect_errors'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `execute'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `ensure in block (2 levels) in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `block (2 levels) in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/connecting.rb:293:in `synchronize'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:488:in `block in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:555:in `check_database_errors'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:487:in `listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/connectors/database.rb:25:in `block in start'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    | /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `exec': PQsocket() can't get socket descriptor (PG::ConnectionBad)
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `block in execute_query'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/logging.rb:38:in `log_connection_yield'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:171:in `execute_query'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `block in execute'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:136:in `check_disconnect_errors'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:159:in `execute'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `ensure in block (2 levels) in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:518:in `block (2 levels) in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/connecting.rb:293:in `synchronize'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:488:in `block in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:555:in `check_database_errors'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:487:in `listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/connectors/database.rb:25:in `block in start'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    | /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:508:in `wait_for_notify': PQconsumeInput() server closed the connection unexpectedly (PG::ConnectionBad)
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  This probably means the server terminated abnormally
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  before or while processing the request.
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:508:in `block (3 levels) in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:505:in `catch'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:505:in `block (2 levels) in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/database/connecting.rb:293:in `synchronize'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:488:in `block in listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:555:in `check_database_errors'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/sequel-5.71.0/lib/sequel/adapters/postgres.rb:487:in `listen'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/connectors/database.rb:25:in `block in start'
foreman_worker.1.8xvblxhzdy7n@ges-docker4    |  from /usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

The Dockerfile used:

FROM almalinux:8

# Configure CA
COPY files/base-ca.cer /etc/pki/ca-trust/source/anchors/base-ca.cer
RUN /bin/update-ca-trust

COPY files/almalinux.repo /etc/dnf.repos.d/almalinux.repo

ARG FOREMAN_VERSION
ENV FOREMAN_VERSION=${FOREMAN_VERSION}

RUN echo "tsflags=nodocs" >> /etc/dnf.conf && \
    dnf -y install https://yum.theforeman.org/releases/${FOREMAN_VERSION}/el8/x86_64/foreman-release.rpm && \
    dnf -y module enable foreman:el8 && \
    dnf clean all

RUN yum -y install \
        foreman-dynflow-sidekiq && \
    dnf clean all

ENV FOREMAN_FQDN=foreman.gestio.sys
ENV FOREMAN_DOMAIN=gestio.sys

ENV RAILS_ENV=production
ENV MALLOC_ARENA_MAX=2
ENV RAILS_SERVE_STATIC_FILES=true
ENV RAILS_LOG_TO_STDOUT=true

WORKDIR /usr/share/foreman
USER foreman

ENV PATH=${HOME}/bin:${PATH}

# Start the orchestrator.
CMD sidekiq -r ./extras/dynflow-sidekiq.rb -c 1 -q dynflow_orchestrator

# Start the worker.
# CMD sidekiq -r ./extras/dynflow-sidekiq.rb -c 15 -q default,1 -q remote_execution,1

The relevant part of docker-compose:

  orchestrator: &foreman_sidekiq
    image: hub.integracio.sys/base/foreman-sidekiq:3.8
    command: sidekiq -e production -r ./extras/dynflow-sidekiq.rb -c 1 -q dynflow_orchestrator
    environment:
      - TZ=Europe/Madrid
      - RAILS_ENV=production
      - DYNFLOW_SIDEKIQ_SCRIPT=/usr/share/foreman/extras/dynflow-sidekiq.rb
      - DYNFLOW_REDIS_URL=redis://redis-tasks:6379/0
      - REDIS_PROVIDER=DYNFLOW_REDIS_URL
    configs:
      - source: foreman_encryption_key.rb
        target: /etc/foreman/encryption_key.rb
      - source: foreman_database.yml
        target: /etc/foreman/database.yml
    hostname: orchestrator
    networks:
      - backend
    depends_on:
      - postgres
      - redis-tasks

  worker:
    <<: *foreman_sidekiq
    command: sidekiq -e production -r ./extras/dynflow-sidekiq.rb -c 15 -q default,1 -q remote_execution,1
    hostname: worker

So, I am not compiling Foreman or Sidekiq myself, I am using the official packages.

I have search for the same issue on internet, and I've found that may be the preconnect option of Sequel::Database::new can be used:

https://github.com/heroku/heroku-buildpack-pgbouncer/issues/29#issuecomment-113240132

I don't know ruby but I've tried this in lib/dynflow/persistence_adapters/sequel.rb:

      def initialize_db(db_path)
        logger = Logger.new($stderr) if ENV['DYNFLOW_SQL_LOG']
        ::Sequel.connect db_path, logger: logger, preconnect: true
      end

And modifying the Dockerfile to reflect the change

# Fix "PG::ConnectionBad: PQsocket() can't get socket descriptor (Sequel::DatabaseDisconnectError)"
COPY files/sequel.rb /usr/share/gems/gems/dynflow-1.8.0/lib/dynflow/persistence_adapters/sequel.rb

This doesn't work. Can you help me please?