automaticmode / active_workflow

Polyglot workflows without leaving the comfort of your technology stack.
https://www.activeworkflow.org
Other
830 stars 67 forks source link

Application freezes when run with docker-compose since v0.9.9 #9

Closed suhrm1 closed 3 years ago

suhrm1 commented 4 years ago

Hey, I realised that instances of ActiveWorkflow that I installed after the v0.9.9 update are not working properly. My environment is d Docker-based, I built the v0.9.9 image and created the deployment from scratch with the new v0.9.9 docker-compose.yml an .env files. AW starts alright but if run with SINGLE_DYNO = 1 seems to run into 2+ minute freezes very frequently. See logs (1) and(2) below for output. If I uncomment the SINGLE_DYNO line from the docker-compose.yml file and restart, there are no such freezes and no abnormalities in the logs as far as I can tell but message propagation does not work anymore at all.

Instances set up with version 0.9.7 don't seem to encounter these problems Is there something off with the basic settings for Docker deployment in v0.9.9?

Best, Markus

Logs

(1) startup produces a bunch of log messages, see the 2+ min delay at the end. I cut out ~50 lines of the same message with the same timestamp each where there are "..."

active_workflow_1  | 17:29:57 worker.1    | 2020-08-30T10:29:57-0700: [Worker(delayed_job.0 host:82cd6c4e3049 pid:31)] Starting job worker
active_workflow_1  | 17:29:57 web.1       | [29] * Listening on tcp://0.0.0.0:3000
active_workflow_1  | 17:29:57 web.1       | [29] ! WARNING: Detected 2 Thread(s) started in app boot:
active_workflow_1  | 17:29:57 web.1       | [29] ! #<Thread:0x000055f81fbdb808@/app/vendor/bundle/ruby/2.6.0/gems/concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/atomic/ruby_thread_local_var.rb:38 sleep_forever> - /app/vendor/bundle/ruby/2.6.0/gems/concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/atomic/ruby_thread_local_var.rb:40:in `pop'
active_workflow_1  | 17:29:57 web.1       | [29] ! #<Thread:0x000055f821ef34f8@/app/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.3.2/lib/active_record/connection_adapters/abstract/connection_pool.rb:334 sleep> - /app/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.3.2/lib/active_record/connection_adapters/abstract/connection_pool.rb:337:in `sleep'
active_workflow_1  | 17:29:57 web.1       | [29] Use Ctrl-C to stop
active_workflow_1  | 17:29:57 web.1       | [29] - Worker 0 (pid: 64) booted, phase: 0
active_workflow_1  | 17:32:18 scheduler.1 | Starting scheduler
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_1m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_2m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_5m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_10m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_30m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
...
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_1m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
...
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_1m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing schedule for every_2m
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
...
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:32:18 scheduler.1 | Queuing message propagation
active_workflow_1  | 17:34:41 scheduler.1 | Queuing message propagation

After the 2+ min freeze, AW webinterface resumes normal operation and logs blurt out all of the following at once:

logs2.txt

vidas commented 4 years ago

Hi Markus, thanks for the notice!

This is a tough one to solve quickly, still working on it. As a temporary solution, I have released a point version v0.9.9.1 that runs docker-compose without SINGLE_DYNO=1 (three separate containers).

After the latest refactoring scheduler was extracted out into a separate component and all parts of ActiveWorkflow (web, scheduler, and worker/-s) now run independently. SINGLE_DYNO mode was added specifically as a workaround to let everything run in a single container. Apparently it doesn't work correctly and we will resolve it as soon as possible.

Thank you! Vidas

suhrm1 commented 4 years ago

Hi Vidas, the 0.9.9.1 fix does not seem to work. Checking out the repository from scratch an starting with docker-compose up first complains about formatting of the docker-compose.yml (environment values' lines must not begin with "-"). After resolving that, the database does not get initialized correctly, I guess from the logs:

active_workflow_scheduler_1  | { 47216107507720 rufus-scheduler intercepted an error:
active_workflow_scheduler_1  |   47216107507720   job:
active_workflow_scheduler_1  |   47216107507720     Rufus::Scheduler::EveryJob "0.5s" {:tag=>"scheduler"}
active_workflow_scheduler_1  |   47216107507720   error:
active_workflow_scheduler_1  |   47216107507720     47216107507720
active_workflow_scheduler_1  |   47216107507720     ActiveRecord::StatementInvalid
active_workflow_scheduler_1  |   47216107507720     PG::UndefinedTable: ERROR:  relation "delayed_jobs" does not exist
active_workflow_scheduler_1  | LINE 8:  WHERE a.attrelid = '"delayed_jobs"'::regclass
active_workflow_scheduler_1  |                             ^
active_workflow_scheduler_1  |   47216107507720       /app/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.3.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:65:in `exec'
(...)
postgres_1                   | ERROR:  relation "delayed_jobs" does not exist at character 454
postgres_1                   | STATEMENT:  SELECT a.attname, format_type(a.atttypid, a.atttypmod),
postgres_1                   |         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
postgres_1                   |         c.collname, col_description(a.attrelid, a.attnum) AS comment
postgres_1                   |    FROM pg_attribute a
postgres_1                   |    LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
postgres_1                   |    LEFT JOIN pg_type t ON a.atttypid = t.oid
postgres_1                   |    LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation

No need to rush though, my main system still works and there's no urgent reason to upgrade that instance at the moment.

Best, Markus

vidas commented 4 years ago

Fixed the fix: v0.9.9.2 fixes broken docker-compose syntax and database initialization issues (that is where errors came from). Rushing things is never an answer...

vidas commented 3 years ago

Release v0.9.10 fixes the single-dyno mode (simply by getting rid of foreman and replacing it with a simple shell script). This should not affect anyone running ActiveWorkflow in multi-container mode (which is recommended for production anyway, but was not properly communicated...)

Default docker-compose.yml started using prebuilt docker image (from the docker hub), so if you have any source code modifications you may need to edit it (or use the one in docker/docker-compose.local.yml that still builds images locally).

Thanks!

suhrm1 commented 3 years ago

Finally got around to testing v0.9.10 and it works. I consider this issue resolved, thank you!