breamware / sidekiq-batch

Sidekiq Batch Jobs Implementation
MIT License
357 stars 95 forks source link

on_complete called before batched jobs finish running #41

Closed JakeKarasik closed 2 years ago

JakeKarasik commented 4 years ago

I saw some similar issues reported where it seems that this should have been resolved https://github.com/breamware/sidekiq-batch/issues/20 (resolved? in https://github.com/breamware/sidekiq-batch/pull/26) for on_success callbacks but this seems to still be broken for on_complete. I started with ~25.5k jobs queued up within a batch and the on_complete method was called with ~7.5k jobs still queued up to run. Any thoughts?

Does it matter how many threads the batched jobs are running on?

Also noticed the BID on the callback job was different from the BID on the batched jobs, is that how it should be?

class DisclosureService
...
def trigger_send
  ...
  batch = Sidekiq::Batch.new
  batch.on(:complete, DisclosureService, disclosure_batch_id: batch_id)
  batch.jobs do
    accounts_to_disclose.each do |account|
      SendDisclosureJob.perform_async(account.account_id, batch_id)
    end
  end
end
...

def on_complete(status, options)
   ...
end
end
JakeKarasik commented 4 years ago

Same issue still occurring, just wondering if you had any insight @managr ?

github-actions[bot] commented 4 years ago

Stale issue message

github-actions[bot] commented 4 years ago

Stale issue message

mbajur commented 4 years ago

I'm having the exact same issue. My sidekiq-batch stack code looks as follows:

# integrations/runner_worker.rb
module Integrations
  class RunnerWorker
    include Sidekiq::Worker

    sidekiq_options retry: false

    class Callbacks
      def on_complete(status, options)
        mark_db_job_as_finished
      end
    end

    def perform
      batch = Sidekiq::Batch.new
      batch.on(:complete, Callbacks)

      batch.jobs do
        FetcherWorker.perform_async
      end
    end
  end
end

# integrations/fetcher_worker.rb
module Integrations
  class FetcherWorker
    include Sidekiq::Worker

    sidekiq_options queue: :fetcher

    def perform
      service = SomeService.new

      service.on(:page_attached) do |idx| # wisper gem `#on` usage
        batch.jobs do
          PageHandlerWorker.perform_async
        end
      end

      service.call
    end
  end
end

# integrations/page_handler_worker.rb
module Integrations
  class PageHandlerWorker
    include Sidekiq::Worker

    sidekiq_options queue: :page_handler

    def perform
      batch.jobs do
        PagePersisterWorker.perform_async
      end
    end
  end
end

# integrations/page_persister_worker.rb
module Integrations
  class PagePersisterWorker
    include Sidekiq::Worker

    sidekiq_options queue: :page_persister

    def perform
      do_stuff
    end
  end
end

And now, as an example - i have a job that spawned:

which gives us total of 202 jobs/sub-jobs in entire batch but on_complete callback was triggered way too early and the state value at that point was (also, pending is -14 for some reason):

{
  "bid": "4x7QqJrWmyhvjA",
  "total": 17,
  "pending": -14,
  "complete": true,
  "failures": 0,
  "created_at": "1596717698.3361342",
  "parent_bid": null,
  "child_count": 0,
  "failure_info": []
}

is that some sort of a sidekiq-batch bug or am i doing something wrong in here? My stack is:

Thanks in advance

mbajur commented 4 years ago

@nglx with all due respect - this auto-closing feature doesn't make much sense :) There is obviously a real-life bug described in here and this issue shouldn't be closed IMO.

ovidiubite commented 3 years ago

@mbajur @JakeKarasik I've worked at a patch to solve this issue for my specific case, but I'm not sure if the problem I was trying to solve is exactly the same one as yours. Also, this fix may look a bit dirty 🙅 . You can check my PR from above if this is still of interest to you. Any ideas are welcomed! Cheers!

jaceksamol commented 2 years ago

@mbajur @JakeKarasik have you fixed above issue? I have similar problem and I'm looking how to fix it.

nglx commented 2 years ago

@jaceksamol let me reopen it - I'll try to dig in if there was some related PR to it later today.