Pull into Sidekiq core?

fatkodima / sidekiq-iteration

Make your long-running sidekiq jobs interruptible and resumable.

https://rubydoc.info/gems/sidekiq-iteration

MIT License

270 stars 8 forks source link

Pull into Sidekiq core? #6

Closed mperham closed 4 months ago

mperham commented 5 months ago

Hey @fatkodima, would you be interested in integrating this functionality into Sidekiq core for 7.3 or have me do it? I've had several customers report this gem as very useful for solving their problems with long-running jobs, making deployments quicker and safer, etc. I think it's a good pattern/API to encourage people to use.

fatkodima commented 5 months ago

Hey! Wow, thats awesome to get this merged into sidekiq itself!

I will try to do that on this weekend (or next weekend) and see how it goes. Let me know if you have plans to release 7.3 sooner.

mperham commented 5 months ago

I have a 7.3 milestone targeting a summer release. 7.2.3 will be out very soon.

fatkodima commented 5 months ago

Wanted to ask, what API would you prefer?

(my preference)

class MyJob
include Sidekiq::Job
include Sidekiq::Iteration
end

or something like 2.

class MyJob
  include Sidekiq::Job
  sidekiq_options iteration: true, ...
end

And what API would you prefer for throttling (https://github.com/fatkodima/sidekiq-iteration/blob/master/guides/throttling.md)? Currently it is configured via a top level call in the class' body.

mperham commented 5 months ago

I'd probably go with:

class SomeJob
  include Sidekiq::Job
  include Sidekiq::Job::Iterable

  sidekiq_options iteration: { whatever: 123 }
end

Unlike Rails, I dislike top-level class methods like throttle_on as they can be hard to test and mock. I would prefer that be an instance method, server middleware provides an instance:

class ThrottleMiddleware
  include Sidekiq::ServerMiddleware

  def call(instance, job, queue)
    if instance.throttle_on?
      # do something
    end
  end
end

sobrinho commented 5 months ago

As suggestion @mperham, I feel like the framework should be pulled into Sidekiq but not the concrete implementations.

AR can be suggested to be used as I reported on #9:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.in_batches(start: cursor) do |relation|
      yielder.yield(relation, relation.maximum(:id))
    end
  end
end

def each_iteration(relation)
  relation.update_all(...)
end

Or for batches:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.find_in_batches(start: cursor) do |batch|
      yielder.yield(batch, batch.last.id)
    end
  end
end

def each_iteration(batch)
  batch.each { ... }
end

Or for individual records:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.find_each(start: cursor) do |record|
      yielder.yield(record, record.id)
    end
  end
end

def each_iteration(record)
  record.update(...)
end

Feels like having the CSV, Array and AR may be too much, I'm not sure, just throwing ideas out here.

mperham commented 5 months ago

Having optimized support for a few well known types/libraries is useful but we should have generic Enumerable support too.