bkeepers / qu

a Ruby library for queuing and processing background jobs.
MIT License
505 stars 50 forks source link

Process payloads in batches #93

Open grantr opened 9 years ago

grantr commented 9 years ago

Some jobs would benefit from being able to process a bunch of payloads at once. Specifically, a job that writes documents to a data store could see a dramatic performance increase by sending 50 or 100 documents per request instead of 1. Since it may not be possible for the producer to enqueue entire batches in a single payload, the queue framework needs to handle popping multiple payloads and collecting them into a batch job.

This could work well with batch pop support in the backend, but doesn't require it. Even if the backend only supports popping one payload at a time, a job might still want a batch of payloads.

mauricio commented 9 years ago

well -> https://github.com/bkeepers/qu/pull/82

grantr commented 9 years ago

Ah ha! I will take a look @mauricio

mauricio commented 9 years ago

:+1:

grantr commented 9 years ago

@mauricio Here's what I would do differently:

BatchPayload is an array wrapper that knows how to dispatch multiple payloads. For standard Qu::Job classes it processes each payload one at a time. Payloads for BatchJob classes are bundled up and sent to the job as an array. The job implements each to iterate through the payload list.

class BulkWrite < Qu::BatchJob
  batch_size 50

  def perform
    set_up_batch_write
    each do |arg1, arg2|
      do_something_with_payload(arg1, arg2)
    end
    complete_batch_write
  end
end
mauricio commented 9 years ago

@grantr that would simplify the implementation a lot.

Another problem is, how do you know if a queue produces batch jobs or not?

This was one of the main complications for my implementation, having to pull and push stuff back to the queue when they are "single" jobs instead of batch jobs. My usage back at the time was a single use queue, so I didn't have to care about this much, but if you're running on top of a general queue this could give you trouble as clients mix batch and non-batch jobs at the same place.

Why would you implement batch pushes separately?

Seems like a very simple solution to have given you have a backend that supports them, you just push many messages at once instead of one at a time.

grantr commented 9 years ago

how do you know if a queue produces batch jobs or not?

IMO this is the backend's responsibility. The backend can decide whether it will pull payloads in bulk from the queue service. If it so decides, then the BatchPayload it creates may contain payloads for multiple jobs. When it is processing the payloads, it can look at each job to see whether it accepts batches or not. If so, it groups all the like payloads into a single job and performs once. If not, it performs each job individually.

This decouples batch pop from batch process, and keeps the perform logic in the *Payload classes. There's never a need to return jobs to the queue, because the fallback is to perform all payloads in sequence as if they were not batched.

Batch push is separate because it doesn't have anything to do with batch processing (IMHO). Consumers don't need to know if producers have batch push, and producers don't need to know if consumers have batch pop.