RecurringJob `jobs` performance nasties

abrom commented 7 years ago

Not sure if I'd class this as an issue or maybe instead something just to be documented, but here seems like a good a place as any to kick off a discussion. I had some issues last night where there were ~ 200k jobs in the queue and the recurring jobs started to die rather badly from some PG timeouts due to the fuzzy nature of the query in the jobs class method added in the RecurringJob module.

def jobs
  ::Delayed::Job.where("(handler LIKE ?) OR (handler LIKE ?)", "--- !ruby/object:#{name} %", "--- !ruby/object:#{name}\n%")
end

Adding an index on the handler column is in reality not an option as the column is just too big. Not designed to be indexed, especially not in a way that would allow for a practical use of fuzzy queries (LIKE) etc.

My solution was to add a handler_type column which contained the job name for ease of filter.

  def up
    add_column :delayed_jobs, :handler_type, :string

    # Go through and populate any existing jobs with the new 'handler_type' column value.
    Delayed::Job.reset_column_information
    Delayed::Job.find_each do |job|
      job.update_attribute(:handler_type, job.name)
    end

    execute 'CREATE INDEX delayed_job_handler_type_idx ON delayed_jobs (handler_type text_pattern_ops);'
  end

  def down
    execute 'DROP INDEX delayed_job_handler_type_idx;'
    remove_column :delayed_jobs, :handler_type
  end

Then added the following to an initialiser to handle new enqueued jobs as well as patching RecurringJob:

# Add before enqueue callback to persist the handler type
Delayed::Worker.plugins << Class.new(Delayed::Plugin) do
  callbacks do |lifecycle|
    lifecycle.before(:enqueue) do |job|
      job[:handler_type] = job.name
    end
  end
end

# Modify the recurring job `jobs` class method to use the new handler type instead of
# fuzzy searching (with some nasty performance side effects)
module Delayed
  module RecurringJob
    module ClassMethods
      def jobs
        ::Delayed::Job.where(handler_type: name)
      end
    end
  end
end

Solved my performance issues :) However it becomes more than just a plugin with this sort of change..

Feel free to close, or I'd be happy to put together some sort of formal description of the problem (and possible solution) to be included in the README?

abrom commented 7 years ago

Would probably switch back to use the built in create_index helper and drop the text_pattern_ops operator class to make it DB agnostic (doesn't need it given it's doing an exact match anyway)

afn commented 5 years ago

@abrom Sorry for the extremely long-latency response!

I definitely agree that the LIKE query is suboptimal, and I like the idea of adding a column to the delayed_jobs table to uniquely identify the jobs. This behavior would have to be optional --- if the column exists, we should use it, and otherwise fall back on the LIKE query.

I'm going to close this issue for now, but if you're interested in contributing a PR, I'd be very grateful. I think this should comprise:

A Rails generator that creates a migration to add a column (say, recurring_job_descriptor) and corresponding index to the delayed_jobs table
Dynamically detect the presence of this column at runtime
If the column is present, then scheduling new jobs should set the value of the column, and the jobs method should use the value of the column rather than the current LIKE query
The LIKE query matches on job class name and, optionally, a job_matching_param. Adding this optimization shouldn't change the behavior, so we should store, and query on, this pair of values in recurring_job_descriptor.

Ideally, the migration would also populate recurring_job_descriptor for existing delayed jobs (to avoid ending up with duplicates on the first run after the migration is run), but this is tricky; it would require not just scanning the entire delayed_jobs table but deserializing each handler to determine if it's a RecurringJob and generate its recurring_job_descriptor. So I would probably just include a warning in the README instructing users to manually unschedule any existing scheduled jobs before deploying this change.

abrom commented 5 years ago

No worries @afn

I suspect the better solution might be to have the optional handler name column implemented in delayed job, then DJR would just check for it's presence within the jobs method and switch as appropriate.

The only problem I see with that is that the DJ project appears to have gone stale :( No responses to issues since March.

I'll create an issue there and see if I get any bites. I think it would be useful to have this information in general. I also use the name/type column for analytic/queue management purposes (ie I care if X or Y job have been overdue for a period, but not Z).

amitree / delayed_job_recurring

RecurringJob `jobs` performance nasties #15