Open caius opened 10 months ago
The order we observe on 0.5.2 from lots of puts debugging is:
Delayed::Worker#run
invoked with job data
Delayed::Worker#max_run_time
calls job.max_run_time
Delayed::Backend::Base#max_run_time
calls payload_object.respond_to?(:max_run_time)
where payload_object
is the Delayed::JobWrapper
instance. This in turn forwards the message to #job
which instantiates and memoizes the ActiveJob instance in @job
ivar.
β οΈ The job has been deserialised at this point, invoking ApplicationJob#deserialize
and setting Current
attributes
Delayed::Backend::Base#invoke_job
then called which calls payload_object.perform
, which is Delayed::JobWrapper#perform
ActiveJob execute callbacks are fired, the first of which is the Rails executor.
β οΈ ActiveSupport::CurrentAttributes
are reset at this point. Anything else cleared through Rails Executors callbacks is also triggered here.
job.perform_now
is called. As @job
already contains the job instance no further deserialisation takes place here.
We now arrive at OurJob#perform
with Current
having only nil
values even though we set them in ApplicationJob#deserialize
.
@caius it's been a busy couple of weeks for me, but I haven't forgotten this PR! I intend to do some testing of this branch, but your reasoning is sound, and I suspect that because much of our DJ infrastructure predates ActiveSupport::CurrentAttributes
(instead we have other ways of sharing & clearing job-thread-global state), we hadn't encountered this inconsistency yet.
@smudge no worries, it's not exactly a simple thing to think through and understand so wasn't expecting it to be a quick thing. Let me know if you need anything more explaining to help ππ»
This pull request has been automatically closed because it has not been updated in the last month. :sleepy:
If you still need this change, you can reopen it.
Thanks for helping keep our house in order!
If you still need this change, you can reopen it.
Don't appear to be able to do that, bit of a hostile experience as an outside contributor if I'm honest.
Hi Caius, I'm the CTO at Betterment and I wanted to say sorry for the unpleasant experience. This was a result of a change we made for our internal software dev workflows, and we haven't been thoughtful about the impact on our OSS repos. We'll take that away and see what to do. Genuinely appreciate the contribution.
@jmileham appreciate the note, I didn't mean my original message to come across as curt as I suspect it might have done re-reading it. Thanks :)
Not at all, just wanted to make sure you know we see the unpleasant experience and appreciate the effort you've put in. Sorry we haven't been able to engage as quickly as we'd like with this PR.
Noticed CI had a linting error for parallel assignment, fixed that up whilst here π
Change
Delayed::JobWrapper
from directly invoking:execute
callbacks andjob#perform_now
, to instead callingActiveJob::Base.execute
and letting ActiveJob handle the callbacks, job deserialisation and performing internally.To explain why, we've observed a change in behaviour between
delayed_job
's queue adapter anddelayed
's queue adapter with thread variables being reset betweenjob#deserialize
andjob#perform
being called from the same worker thread/process.Some background context around the Rails executor and ActiveSupport::CurrentAttributes is useful. Feel free to skip down a couple of paragraphs if you already understand them!
The ActiveJob Railtie has an initialiser that wraps all job executions with the Rails executor (it calls
app.reloader
in the code, but that's effectively the executor still.) This ensures code is reloaded between jobs running in development mode, but also makes sure any executor callbacks are invoked when the reloader block is entered at runtime in any environment. (If you've ever wondered how things in rails get reset between web requests, this mechanism is the answer.)The ActiveSupport Railtie has an initialiser that resets
ActiveSupport::CurrentAttributes
instances when entering an executor block. This means before every job is run, all subclasses/instances ofActiveSupport::CurrentAttributes
have their values cleared which stops things leaking between requests/jobs/blocks of work. This effectively lets you have thread-global variables that are cleaned up once the current unit of work has finished, without you needing to remember.In our app we're using the Rails default
Current
model which is a subclass ofActiveSupport::CurrentAttributes
to store some request-global information about an external service to use. (This allows us to override that on a per-request basis in non-production environments for testing purposes.) We pass that override down into the jobs at enqueue time by overridingApplicationJob#serialize
to save the value ofCurrent.service_url
into the job data hash.Our code in
SomeJob#perform
and elsewhere in the app invoked from the job can then callCurrent.service_url
to find the overridden value. Doing it this way means the behaviour is shared between all jobs that need that overridden url without having to remember to overrideserialize
/deserialize
in each job when required.Switching from
delayed_job
todelayed
broke this mechanism silently π¬. No errors were raised but our override wasn't being observed in the code. Debugging it we observed within theApplicationJob#deserialize
methodCurrent.service_url
was being set correctly, but by the time thejob#perform
method was calledCurrent.service_url
wasnil
again. The case of the vanishing variable?! πAt this point we deduced it was likely an ordering issue around
ActiveSupport::CurrentAttributes
being reset and investigated further in that direction. After litteringputs
statements throughdelayed
andActiveJob
I think we've figured out why this is occurring and how it differs todelayed_job
's behaviour.We see
JobWrapper#job
being invoked early in the worker method inDelayed::Worker#run
, viaDelayed::Worker#max_run_time
seeing if the job instance responds to#max_run_time
. DJ Hooks will also use the job instance (payload_object
). JobWrapper forwards all missing methods to thejob
method, which deserialises the job object from the data.The
JobWrapper#perform
method then re-uses the@job
memoized instance in when invoking execute callbacks (which reset all theActiveSupport::CurrentAttributes
values) then callsjob.perform_now
without invoking#deserialize
again. This means we've deserialised the job, then invoked the Rails executor entry callbacks, then calledjob#perform
. Subtle, but unexpected (to us at least) behaviour change.This PR changes
JobWrapper#perform
to callActiveJob::Base.execute(job_data)
to have ActiveJob invoke callbacks, deserialize the job and perform it in the correct order. In delayed's case it meansjob#deserialize
will be called twice, due to the worker interrogating the job instance before getting to callingJobWrapper#perform
. The crucial change is that deserialisation happens after the rails executor callbacks, so any override tojob#deserialize
expecting to be run in the same "unit of work" as thejob#perform
method will work.For reference, the
delayed_job
adapter doesn't deserialize the job_data before callingActiveJob::Base.execute
so doesn't have this ordering effect in it. The tradeoff isdelayed_job
's adapter also doesn't allow overridingmax_run_time
, etc in the job class which is useful.I can't think of a way to solve this without either breaking the API allowing
max_run_time
methods on job classes (aside from perhaps making them class methods?) or having the side-effect of deserializing the job before the Rails executor entry which can lead to seemingly weird behaviour when using standard Rails executor aware objects in this way.(For ease of linking to the source code, the above description links to Rails 7.1.2 source code, but holds true back to Rails 5.0 for ActiveJob and Rails 5.2 for ActiveSupport::CurrentAttributes. Aka the first version each feature appeared respectively.)