bensheldon / good_job

Multithreaded, Postgres-based, Active Job backend for Ruby on Rails.
https://goodjob-demo.herokuapp.com/
MIT License
2.66k stars 199 forks source link

Error logs from failed jobs used all storage space #495

Closed tedt10e closed 2 years ago

tedt10e commented 2 years ago

Hello Ben,

I recently had an issue with the production server as there is no more storage space due to the log generated by the failed jobs.

It happened when there is an issue connecting to the mail server but it keeps trying to send mail continuously (in milliseconds) and generates error logs.

I've included the sample from the logs below., but only copy the first 3 and remove the stack trace from the first 2 for readability.

As you can see from the below, it first attempts at 25-Jan-2022 04:49:51.640 then 25-Jan-2022 04:49:51.781 then 25-Jan-2022 04:49:51.943 and so on ... which immediately filled up the whole storage of the server.

I'm still using GJ 1.9.6 (I'm a bit late behind to upgrade πŸ˜€) with JRuby 9.2.19, Rails 6.1.4, Tomcat 9, Postgres 13.

I'm not sure it is something that we need to configure for GJ not to repeatedly attempt to deliver the job. At this moment, I do not set any setting for GJ except config.good_job = { execution_mode: :async, max_threads: 5, poll_interval: 30 } in production.rb

Can you please help me advise what is the best way to handle this behavior? πŸ™

25-Jan-2022 04:49:51.640 INFO [pool-4-thread-1] org.apache.catalina.core.ApplicationContext.log E, [2022-01-25T04:49:51.638627 #1] ERROR -- : [ActiveJob] [ActionMailer::MailDeliveryJob] [a3cfc255-6c17-4b36-a7d4-e4105551656f] Error performing ActionMailer::MailDeliveryJob (Job ID: a3cfc255-6c17-4b36-a7d4-e4105551656f) from GoodJob(bvp_production_mailers) in 130.97ms: SocketError (initialize: name or service not known):
org/jruby/ext/socket/RubyTCPSocket.java:141:in `initialize'
org/jruby/RubyIO.java:1156:in `open'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:539:in `tcp_socket'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:549:in `block in do_start'
org/jruby/ext/timeout/Timeout.java:114:in `timeout'
org/jruby/ext/timeout/Timeout.java:90:in `timeout'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:548:in `do_start'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:518:in `start'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb:109:in `start_smtp_session'
[... many stack trace]

25-Jan-2022 04:49:51.781 INFO [pool-4-thread-1] org.apache.catalina.core.ApplicationContext.log E, [2022-01-25T04:49:51.780596 #1] ERROR -- : [ActiveJob] [ActionMailer::MailDeliveryJob] [a3cfc255-6c17-4b36-a7d4-e4105551656f] Error performing ActionMailer::MailDeliveryJob (Job ID: a3cfc255-6c17-4b36-a7d4-e4105551656f) from GoodJob(bvp_production_mailers) in 53.93ms: SocketError (initialize: name or service not known):
org/jruby/ext/socket/RubyTCPSocket.java:141:in `initialize'
org/jruby/RubyIO.java:1156:in `open'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:539:in `tcp_socket'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:549:in `block in do_start'
org/jruby/ext/timeout/Timeout.java:114:in `timeout'
org/jruby/ext/timeout/Timeout.java:90:in `timeout'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:548:in `do_start'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:518:in `start'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb:109:in `start_smtp_session'
[... many stack trace]

25-Jan-2022 04:49:51.943 INFO [pool-4-thread-1] org.apache.catalina.core.ApplicationContext.log E, [2022-01-25T04:49:51.942259 #1] ERROR -- : [ActiveJob] [ActionMailer::MailDeliveryJob] [a3cfc255-6c17-4b36-a7d4-e4105551656f] Error performing ActionMailer::MailDeliveryJob (Job ID: a3cfc255-6c17-4b36-a7d4-e4105551656f) from GoodJob(bvp_production_mailers) in 61.2ms: SocketError (initialize: name or service not known):
org/jruby/ext/socket/RubyTCPSocket.java:141:in `initialize'
org/jruby/RubyIO.java:1156:in `open'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:539:in `tcp_socket'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:549:in `block in do_start'
org/jruby/ext/timeout/Timeout.java:114:in `timeout'
org/jruby/ext/timeout/Timeout.java:90:in `timeout'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:548:in `do_start'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/smtp.rb:518:in `start'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb:109:in `start_smtp_session'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb:100:in `deliver!'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/message.rb:2159:in `do_delivery'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/message.rb:260:in `block in deliver'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/actionmailer-6.1.4/lib/action_mailer/base.rb:559:in `block in deliver_mail'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications.rb:203:in `block in instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications.rb:203:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/actionmailer-6.1.4/lib/action_mailer/base.rb:557:in `deliver_mail'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/mail-2.7.1/lib/mail/message.rb:260:in `deliver'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/actionmailer-6.1.4/lib/action_mailer/message_delivery.rb:119:in `block in deliver_now'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/actionmailer-6.1.4/lib/action_mailer/rescuable.rb:17:in `handle_exceptions'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/actionmailer-6.1.4/lib/action_mailer/message_delivery.rb:118:in `deliver_now'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/actionmailer-6.1.4/lib/action_mailer/mail_delivery_job.rb:23:in `perform'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/execution.rb:48:in `block in perform_now'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:117:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/i18n-1.8.10/lib/i18n.rb:314:in `with_locale'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/translation.rb:9:in `block in Translation'
org/jruby/RubyBasicObject.java:2694:in `instance_exec'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:126:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/core_ext/time/zones.rb:66:in `use_zone'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/timezones.rb:9:in `block in Timezones'
org/jruby/RubyBasicObject.java:2694:in `instance_exec'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:126:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/instrumentation.rb:21:in `block in instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications.rb:203:in `block in instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications.rb:203:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/instrumentation.rb:31:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/instrumentation.rb:14:in `block in Instrumentation'
org/jruby/RubyBasicObject.java:2694:in `instance_exec'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:126:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/logging.rb:22:in `block in tag_logger'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/tagged_logging.rb:99:in `block in tagged'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/tagged_logging.rb:37:in `tagged'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/tagged_logging.rb:99:in `tagged'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/logging.rb:22:in `tag_logger'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/logging.rb:15:in `block in Logging'
org/jruby/RubyBasicObject.java:2694:in `instance_exec'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:126:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:137:in `run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/execution.rb:47:in `perform_now'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/execution.rb:25:in `block in execute'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:117:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/railtie.rb:47:in `block in Railtie'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/execution_wrapper.rb:88:in `wrap'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/reloader.rb:72:in `block in wrap'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/execution_wrapper.rb:84:in `wrap'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/reloader.rb:71:in `wrap'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/railtie.rb:46:in `block in Railtie'
org/jruby/RubyBasicObject.java:2694:in `instance_exec'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:126:in `block in run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/callbacks.rb:137:in `run_callbacks'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activejob-6.1.4/lib/active_job/execution.rb:23:in `execute'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/job.rb:260:in `block in execute'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications.rb:203:in `block in instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/notifications.rb:203:in `instrument'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/job.rb:259:in `execute'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/job.rb:227:in `perform'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/job.rb:164:in `block in perform_with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/lockable.rb:137:in `with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation/delegation.rb:67:in `block in with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation.rb:406:in `block in scoping'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation.rb:804:in `_scoping'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation.rb:406:in `scoping'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation/delegation.rb:67:in `with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/job.rb:159:in `perform_with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation/delegation.rb:67:in `block in perform_with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation.rb:406:in `block in scoping'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation.rb:804:in `_scoping'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation.rb:406:in `scoping'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activerecord-6.1.4/lib/active_record/relation/delegation.rb:67:in `perform_with_advisory_lock'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/job_performer.rb:29:in `next'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/scheduler.rb:231:in `block in create_task'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/activesupport-6.1.4/lib/active_support/execution_wrapper.rb:88:in `wrap'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/good_job-1.9.6/lib/good_job/scheduler.rb:230:in `block in create_task'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
com/concurrent_ruby/ext/SynchronizationLibrary.java:232:in `synchronize'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:19:in `execute'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/ivar.rb:169:in `safe_execute'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/scheduled_task.rb:285:in `process_task'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/executor/timer_set.rb:98:in `block in ns_post_task'
/usr/local/tomcat/webapps/ROOT/WEB-INF/gems/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/executor/java_executor_service.rb:79:in `run'
tedt10e commented 2 years ago

I just spent time reading through the README and realized those are all mentioned in README. 😁

I've just set ActionMailer::MailDeliveryJob.retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY in initializer/good_job.rb and it attempting nicely as below.

25-Jan-2022 09:38:13.545 25-Jan-2022 09:38:16.686 25-Jan-2022 09:38:34.792 25-Jan-2022 09:39:57.880 25-Jan-2022 09:44:15.991 25-Jan-2022 09:54:43.094 25-Jan-2022 10:16:21.095

I guess this default behavior (to retry continuously) does not come from GJ. (Is it the default behavior of ActiveJob/MaileDeliveryJob?)

Is it something GoodJob can set as a default behavior to set wait: :exponentially_longer?

Or probably it is better to mention somewhere near the top of the README to be aware of this default behavior.

For many of us, we usually use the default settings.

I'm afraid someone else might face the same issue for their production server if they are unaware of this default behavior.

Thanks ❀️

bensheldon commented 2 years ago

That stinks that your disk filled up. I can try to explain the current state of affairs, which is a somewhat cascading problem.

My goal for GoodJob 3.0 is to change the defaults

So the way I'm imagining your scenario to play out in the GoodJob 3.0 world is:

tedt10e commented 2 years ago

Hey Ben,

Thank you so muchπŸ™ for your time and the explanation.

Setting the default in GJ3 sounds excellent.

In the meantime, it's time for me to upgrade GJ from 1.9.6 to the latest version πŸ˜€