Open ollym opened 1 year ago
A quick monkey patch for anyone interested and having the same issues:
# db/migrate/....rb
add_column :good_jobs, :job_class, :string
add_column :good_jobs, :executions, :integer, default: 0
# config/initializers/good_job.rb
Rails.application.reloader.to_prepare do
Rails.application.configure do
# ... good job configuration here
end
module GoodJob::JobExtensions
extend ActiveSupport::Concern
prepended do
scope :job_class, ->(job_class) { where(job_class: job_class) }
# First execution will run in the future
scope :scheduled, -> { where(finished_at: nil).where('COALESCE(scheduled_at, created_at) > ?', DateTime.current).where(arel_table[:executions].lt(2)) }
# Execution errored, will run in the future
scope :retried, -> { where(finished_at: nil).where('COALESCE(scheduled_at, created_at) > ?', DateTime.current).where(arel_table[:executions].gt(1)) }
end
def job_class
read_attribute(:job_class)
end
def serialized_params=(serialized_params)
self.job_class = serialized_params['job_class']
self.executions = serialized_params.fetch('executions', 0)
super
end
end
GoodJob::Job.prepend(GoodJob::JobExtensions)
module GoodJob::ExecutionExtensions
extend ActiveSupport::Concern
prepended do
scope :job_class, ->(job_class) { where(job_class: job_class) }
end
def serialized_params=(serialized_params)
self.job_class = serialized_params['job_class']
self.executions = serialized_params.fetch('executions', 0)
super
end
end
GoodJob::Execution.prepend(GoodJob::ExecutionExtensions)
module GoodJob::BaseFilterExtensions
def job_classes
filtered_query(params.slice(:queue_name)).unscope(:select)
.group(:job_class).count
.sort_by { |name, _count| name.to_s }
.to_h
end
end
GoodJob::BaseFilter.prepend(GoodJob::BaseFilterExtensions)
end
Ooh, that's interesting!
Is there any reason why you don't?
The reason upfront: the Dashboard is a work-in-progress and very YAGNI. So I appreciate you sharing: yes, you need it.
That's interesting that grabbing the values out of the JSONB column is that much slower than individual columns. I wouldn't have expected that. My intuition has been only to pull values out of serialized_params
when they need to be indexed (although I could index them within jsonb, but that seems too extra), so that's good to know that I may need to adjust my intuitions.
This work is definitely doable and should be done. It'll be a little tricky to make sure that it can be done without requiring the database migration be run until the next major release, but that's a common problem I've been handling when these things come up. Here's an example:
It might take me a minute to get to it. Is this something you'd be able to submit a PR for?
Yea we migrated our webhook backend to good_job queue and so the arguments can be large (full request body) and that combined with 50k scheduled jobs meant things were grinding to a halt. We tried adding a btree index on serialised_params->>"job_class" but weirdly it has no effect.
It makes sense because I assume even to get a single value out of the json object it still has to parse the full body. We must have been hitting a memory limit or something.
I'll see what we can do about a PR
For others with the same issue, thanks to https://github.com/bensheldon/good_job/pull/894 (since good_job v3.15) the patch is significantly simpler, the following would work now:
# db/migrate/....rb
add_column :good_jobs, :job_class, :string
add_column :good_jobs, :executions, :integer, default: 0
# config/initializers/good_job.rb
Rails.application.reloader.to_prepare do
module GoodJob::BaseExecutionExtensions
extend ActiveSupport::Concern
class_methods do
def params_job_class
arel_table[:job_class]
end
def params_execution_count
arel_table[:executions]
end
end
def job_class
read_attribute(:job_class)
end
def serialized_params=(serialized_params)
self.job_class = serialized_params['job_class']
self.executions = serialized_params.fetch('executions', 0)
super
end
end
GoodJob::BaseExecution.prepend(GoodJob::BaseExecutionExtensions)
end
As we've scaled up with good_jobs we've started to feel significant pain particularly when the arguments of a job are large, the performance of queries that query through serialized_params can slow down dramatically.
I think you should add
job_class
andexecutions
as columns and refactor any code that queries their values to not useserialized_params
.Is there any reason why you don't?