Closed JakeKarasik closed 2 years ago
Same issue still occurring, just wondering if you had any insight @managr ?
Stale issue message
Stale issue message
I'm having the exact same issue. My sidekiq-batch stack code looks as follows:
# integrations/runner_worker.rb
module Integrations
class RunnerWorker
include Sidekiq::Worker
sidekiq_options retry: false
class Callbacks
def on_complete(status, options)
mark_db_job_as_finished
end
end
def perform
batch = Sidekiq::Batch.new
batch.on(:complete, Callbacks)
batch.jobs do
FetcherWorker.perform_async
end
end
end
end
# integrations/fetcher_worker.rb
module Integrations
class FetcherWorker
include Sidekiq::Worker
sidekiq_options queue: :fetcher
def perform
service = SomeService.new
service.on(:page_attached) do |idx| # wisper gem `#on` usage
batch.jobs do
PageHandlerWorker.perform_async
end
end
service.call
end
end
end
# integrations/page_handler_worker.rb
module Integrations
class PageHandlerWorker
include Sidekiq::Worker
sidekiq_options queue: :page_handler
def perform
batch.jobs do
PagePersisterWorker.perform_async
end
end
end
end
# integrations/page_persister_worker.rb
module Integrations
class PagePersisterWorker
include Sidekiq::Worker
sidekiq_options queue: :page_persister
def perform
do_stuff
end
end
end
And now, as an example - i have a job that spawned:
Integrations::RunnerWorker
, Integrations::FetcherWorker
Integrations::PageHandler
Integrations::PagePersister
which gives us total of 202 jobs/sub-jobs in entire batch but on_complete
callback was triggered way too early and the state
value at that point was (also, pending
is -14 for some reason):
{
"bid": "4x7QqJrWmyhvjA",
"total": 17,
"pending": -14,
"complete": true,
"failures": 0,
"created_at": "1596717698.3361342",
"parent_bid": null,
"child_count": 0,
"failure_info": []
}
is that some sort of a sidekiq-batch bug or am i doing something wrong in here? My stack is:
Thanks in advance
@nglx with all due respect - this auto-closing feature doesn't make much sense :) There is obviously a real-life bug described in here and this issue shouldn't be closed IMO.
@mbajur @JakeKarasik I've worked at a patch to solve this issue for my specific case, but I'm not sure if the problem I was trying to solve is exactly the same one as yours. Also, this fix may look a bit dirty 🙅 . You can check my PR from above if this is still of interest to you. Any ideas are welcomed! Cheers!
@mbajur @JakeKarasik have you fixed above issue? I have similar problem and I'm looking how to fix it.
@jaceksamol let me reopen it - I'll try to dig in if there was some related PR to it later today.
I saw some similar issues reported where it seems that this should have been resolved https://github.com/breamware/sidekiq-batch/issues/20 (resolved? in https://github.com/breamware/sidekiq-batch/pull/26) for on_success callbacks but this seems to still be broken for on_complete. I started with ~25.5k jobs queued up within a batch and the on_complete method was called with ~7.5k jobs still queued up to run. Any thoughts?
Does it matter how many threads the batched jobs are running on?
Also noticed the
BID
on the callback job was different from theBID
on the batched jobs, is that how it should be?