bensheldon / good_job

Multithreaded, Postgres-based, Active Job backend for Ruby on Rails.
https://goodjob-demo.herokuapp.com/
MIT License
2.63k stars 195 forks source link

Really fast jobs run one at a time #871

Open v2kovac opened 1 year ago

v2kovac commented 1 year ago

This has been working fine for jobs where they run kinda slow, the threads all get populated with a job (10 processes, 25 threads each).

But I queued up 700k jobs that run for less than second each, and I noticed there was only ever one job running at a time. Total throughput was like 20 jobs a second slower than serially running them.

Do you know what would cause this? The db load was pretty high, I don't know if that's because of all the jobs enqueued? Do I need to have a shorter time to refetch jobs like less than 5 seconds?

I thought that the cache gets like 10k jobs so you should always have enough ready to go populate all the threads.

v2kovac commented 1 year ago

Btw I'm not sure it has to do with the sheer volume of jobs, i had a colleague queue up like 30k very fast jobs and the same thing happened.

v2kovac commented 1 year ago

we have our query limit set to 1000 btw

bensheldon commented 1 year ago

Hmmm. Can you explain a bit more about the jobs? Are they doing I/O (Database, Network, File operations) or are they strictly Ruby (rendering templates, doing math)?

I don't think ~1 second is particularly short for a job, so I'd still expect to see some parallelism (assuming there is IO).

Also, how were you queuing the jobs?

FYI, GoodJob does not have a mechanism for additive increase of threads (e.g. when 1 thread finishes a job, it will only next lead to 1 thread working 1 job, not 2 threads working 2 jobs, etc.). With LISTEN/NOTIFY and the job cache, the threads should be woken up directly. If LISTEN/NOTIFY or the cache is disabled, then the poll-interval would be the mechanism for additive increase of threads (e.g. every N seconds an additional thread will be created to execution jobs)

v2kovac commented 1 year ago

Sure let me add more context, the job in question is just a couple of db reads and db writes, very simple < 1 second each, I don't know how fast exactly but probably less than half a second even, basically instant.

I used batch.enqueue but same thing happens with any method of enqueueing.

I'm a bit confused by your response, the way I would have intuitively thought this works is that all the available threads are fed a job as soon as they're free. So regardless of what any one job is doing, i should roughly have 10*25 = 250 jobs running at the same time. Are you saying that's not how it works? That a new thread is only populated after the first thread gets overwhelmed, then the second, then the third etc.

I don't think there's a problem with LISTEN/NOTIFY or the job cache since i've run this with longer running jobs and had 250 threads running at the same time.

bensheldon commented 1 year ago

Let me focus just on the explanation, though it sounds like it is not relevant if you're using LISTEN/NOTIFY and the cache.

If LISTEN/NOTIFY or the cache is disabled... then there is only the Poller.

The Poller will, every N seconds, wakes 1 thread from the pool of available threads.

That 1 woken thread queries for 1 job. If the query returns a job, then the job is executed, and then the thread will stay awake and query for another job. If no job is returned from the query, the thread will be returned to the pool and be put asleep.

I think the misconception maybe is that there is a query for X jobs, and then those X jobs are fed into X threads. It doesn't do that.

When LISTEN/NOTIFY is enabled..., then GoodJob will wake up a thread for every notify until all threads are awake and executing jobs. And then those threads will be returned to the pool/asleep when they query for a job and there isn't one.

v2kovac commented 1 year ago

Gotcha so yeah it should work like i'm expecting if LISTEN/NOTIFY is working properly, i'm going to have to devise a test to see if that's the case.

bensheldon commented 1 year ago

To think some more about your case, I think it could be some combination of:

v2kovac commented 1 year ago

Sure the thread count could be changed and that's something we can tweak to get perfect, but i'd expect filling them to be fast regardless.

Let's focus on the fetch-and-lock query, because we are seeing db performance spikes, is there any solution to that problem or is there just a builtin fixed cost to this that we can't overcome. Like i said we observed similar behavior with just 30k really fast jobs enqueued.

bensheldon commented 1 year ago

You might be hitting the limits of using Postgres as a jobs backend, as well as the limits of GoodJob's current Advisory Lock strategy. A couple of related links:

v2kovac commented 1 year ago

I'm still holding out hope that we've just configured something horribly wrong, to get concrete here with these fast running jobs lets say 100k queued up (btw queuing up this many in bulk also takes like 3 mins), i'm getting like 25 jobs/s finishing when i just poll GoodJob::Execution.finished.count. Does that sound off to you?

v2kovac commented 1 year ago

Even more details: 10 processes, 25 max threads each, and i think 10 puma threads per process configured in rails. Then also pool count is set to 35. And finally maybe doesn't matter, but i call include GoodJob::ActiveJobExtensions::Batches in every job in this case and call the batch just to report who executed the job. And 1000 query limit.

bensheldon commented 1 year ago

I think it's too many threads, but you should also increase your db pool size. You'll need ~38 threads for all of GoodJob (there's ~2-4 additional threads that can be created that touch the database), but I'd set 50 just to be safe that you're not doing anything else in the background (e.g. load_async).

Otherwise that all seems fairly normal to me.

For batch enqueue, I think you should probably batch the inserts 1k or 10k at a time (I'd love to know if you find a more efficient batch size)

v2kovac commented 1 year ago

i just got a huge throughput boost by enqueuing 1000 at a time for the 100k size batch.... something like 4 to 6x which doesn't make a whole lot of sense to me even knowing the fetch is slower with more upfront queued jobs, this is a really big deal considering massive dumps of queue jobs happen infrequently.

this also implies that just adding some middleware to queue up jobs when jobs.count < threshold would basically give a free massive perf boost without worrying about how someone is queuing up their jobs

but also feels like this means something is fundamentally wrong, if it's true that you cache 10k jobs at any given time, wouldn't the cost of fetch & lock not matter as much until the 10k is depleted, i need to try to understand that part of the codebase better slowly getting through it all.

bensheldon commented 1 year ago

I wanted to respond quickly on 1 item:

but also feels like this means something is fundamentally wrong, if it's true that you cache 10k jobs at any given time, wouldn't the cost of fetch & lock not matter as much until the 10k is depleted, i need to try to understand that part of the codebase better slowly getting through it all.

The "cache" is not caching jobs or dequeueing multiple jobs or "read ahead". It's caching only the scheduled_at of jobs in memory and scheduling a thread wake-up at that time to try to query for a job. The cache is either populated at startup (the query just plucks the scheduled_at) or when listen/notify informs that a future-scheduled job was enqueued. I realize now it's not a good name because other job backends use "cache" in the way you describe.

I wonder if what you're seeing is either:

reczy commented 1 year ago

Hey @v2kovac - No idea if it's in any way related to your issues, but uuidv4 primary keys are bad for performance after a certain point.

I wonder if using some sort of sequential uuid as the primary key would help...

v2kovac commented 1 year ago

@bensheldon i can confirm i do get a bump in both throughput and parallel running jobs right as a bulk enqueue finishes. It does make me wonder about the listen/notify as well.

I'm doing a workaround at this point with the large wait times for enqueueing that could be interrupted by deploys, where i queue up a callback for the batch, pass all the arguments into the batch object itself and then queue up the jobs 1000 at a time in the callback, and then update on_finish to whatever i actually want the on_finish callback to be. (My batch service assumes all jobs are the same type). It's not ideal, but it works.

class BatchCallbacks::EnqueueCallbackJob < ApplicationJob
  extend T::Sig
  queue_with_priority 100

  sig { params(batch: GoodJob::Batch, params: T.untyped).void }
  def perform(batch, params)
    batch_record = batch._record
    batch_record.update!(on_finish: batch_record.properties[:on_finish_after_enqueue])

    job_arguments_already_enqueued = batch_record.jobs.pluck(:serialized_params).map do |params|
      ActiveJob::Arguments.deserialize(params['arguments'])
    end
    all_arguments = batch_record.properties[:arguments]

    (all_arguments - job_arguments_already_enqueued).each_slice(1000) do |args_slice|
      batch_record.to_batch.enqueue do
        args_slice.each do |args|
          batch_record.properties[:job_klass].constantize.perform_later(*args)
        end
      end
    end
  end
end

@reczy I don't think so but good to know.

v2kovac commented 1 year ago

^ i haven't tested this thoroughly with really really big batches, so might not be performant because of the large memory usage especially if you're using the batch within each job. Might be better to just enqueue a job with all those arguments stuffed in, ie have one job with a large number of arguments intstead of the batch record carrying that burden.