collectiveidea / delayed_job

Database based asynchronous priority queue system -- Extracted from Shopify
http://groups.google.com/group/delayed_job
MIT License
4.81k stars 955 forks source link

Issue with jobs running at random times (early, ahead of schedule)? #1206

Open icemft76 opened 4 months ago

icemft76 commented 4 months ago

Hello, we've been using delayed-job for years (mostly to send emails) with millions of jobs without a problem.

We recently updated to stay on LTS for Rails, and it might be a coincidence but we're now seeing small but intermittently reproducible errors where a scheduled job just decides to run early. I know, it makes no sense !

example: Email_message scheduled to process_at Apr 18,2024 at 10:00 AM EDT, the job ran_at Feb 19, 2024 At 09:00 AM EST Email_message scheduled to process_at Mar 01,2024 at 07:30 AM EST, the job ran_at Feb 19, 2024 At 07:00 AM EST

Mainly i'm trying to see if anyone has run into 'eager' job execution, and if they could point to anything that might cause it. I can't find a pattern of timezone, day of month, timestamp ( partial hours) ...but I do see batches of jobs queue up randomly on the same day (different users, whose jobs have both different created_at and updated_at times).

When it occurs I've confirmed that nobody has manually queued up the jobs through the UI by accident, and there was no restart of the server/process/workers on the same day (I had thought maybe something in our production CI deploy might be to blame). Maybe its the Leap year? :)

Update Details In the last update we went from Rails 6.0.4 to 6.1.7.6, we also updated delayed_job from 4.1.9 to 4.1.11

Current major packages Rail - rails (~> 6.1.7.6) Active Record - activerecord (6.1.7.6) delayed_job (4.1.11) delayed_job_active_record (4.1.7) delayed_job_web (1.4.4)

I'm logging all future scheduled jobs for now, so we can see if the run_at times were mismtached with the email Process_at value (should be the same). Nothing so far, they all look in sync.

We could look at logging updates to the run_at on the jobs themselves next, to see if we can catch anything. I'm at a loss to know where to dig.