Closed davidrichey closed 4 years ago
@davidrichey Did you find a solution for this? It's my first time trying the sidekiq-batch
gem, so I had a quick look at the open issues, and this sounds like a serious bug. I guess this only applies to scheduled jobs that run in the future, so maybe I'll be ok for now, because I don't need batches for scheduled jobs.
I have not unfortunately, we had to roll our own solution since this was not resolved.
Oh no, I just ran into this issue as well!
Fortunately I've added a lot of profiling data to my jobs and included timestamps, so I'm 100% sure that the on_success
callback was fired too early, and before the job had finished.
I'm actually using database records to track all of my jobs as well. I was doing a sanity check in the on_success
callback to ensure that the state had transitioned to processed
, but it crashed because it was still pending
.
For the job that was still pending
and was supposed to be processed
, I saved these timestamps:
"job_start_time": "14:10:47.002"
"job_end_time": "14:11:48.628"
I saw that my on_success
callback updated the record with an error at 14:11:46
, which is before the batch job had even started.
@managr - I was wondering if you are calling the on_success
callback as soon as there are no jobs left in the queue? Or are you waiting for the jobs to finish before calling it?
EDIT: In the meantime I might just go back to my old way of managing batches manually with a database lock. I still use Sidekiq, but I track the total and pending counts in the database, and use a database lock to decrement the pending count. Then when it reaches zero I trigger the on_success
callback. But I was trying to get away from that, so hopefully this can be fixed.
I'm running in a similar problem, where a relatively complex batch with children batches is calling on_success
as soon as its first child completes.
This is a serious bug that prevents this gem to be reliable for any serious project.
@phildionne any chance that you could test if https://github.com/breamware/sidekiq-batch/pull/26 is fixing your issue or not?
@managr just tried and it has indeed worked successfully! how far are you from merging #26 ? let me know if you'd like me to do other tests.
@phildionne let me ping @jbrady42, he mentioned that he'll be fixing the PR, I don't really want to do another PR to his one (or grabbing his code).
Any idea when the PR will be released?
Is this released?
Yes, this was released in 0.1.6 https://rubygems.org/gems/sidekiq-batch/versions/0.1.6
Stale issue message
With more logging, I've been able to spot jobs executing after the
on_success
callback execution. I'm not sure what detail I can provider, but here we go. I am happy to provide more I'm just not sure what else would be helpful.Redacted logs:
_Thought it was strange the
total
field in the data was 0, testing this locally as well. The total is correct up until the onsuccess is called, it then goes to 0.Code that creates the batch:
Workers enqueuing
Environment: