Closed rdpoor closed 11 years ago
I have the exact same issue with custom job classes. My hypothesis is that since custom jobs (i.e. one's that have an explicit perform
method defined) don't have all the nice failure handling and cleanup that the Delayed::Job base.rb provides
@albus522 thoughts?
There was a problem with DJ not unlocking failed jobs a while back. Make sure you are running the latest delayed_job (3.0.5) and delayed_job_active_record (0.4.4)
@albus522 I've upgraded to the latest versions of both gems and I'm still getting the issue. Here's what I did to test.
Let's delay a vanilla Ruby class in the context of a Rails 3.2 app with the default Delayed::Job config options.
class AlwaysFail
def fail_real_hard
raise "I'm failing...yay!"
end
end
100.times do
failer = AlwaysFail.new
failer.delay.fail_real_hard
end
This inserts 100 jobs into the DB that will fail upon being invoked, as they should.
However, many of the failed jobs (in this case, 47 randomly) will still have the locked_by
and locked_at
attributes set.
Here are the gem versions from the Gemfile.lock file
delayed_job (3.0.5)
activesupport (~> 3.0)
delayed_job_active_record (0.4.4)
activerecord (>= 2.1.0, < 4)
delayed_job (~> 3.0)
Any ideas?
What does the last_error say?
+1. I have been observing this in my app. I've been very confused about it.
@albus522 Here is Gist with the entire failed job YAML-fied for your viewing pleasure https://gist.github.com/thoughtpunch/5817499
"Job failed to load: uninitialized constant AlwaysFail." This is your culprit. This is considered a permanent unrecoverable failure. You will notice failed_at is set, which means this job will not get retried.
I'm fine with it failing forever, which is totally expected in this case. What doesn't make sense to me is why the hooks haven't reset the 'locked_by' and 'locked_at' columns. It makes it look like there is stall a process working on these or that they are in progress.
Am I being unreasonable here?
A failed job cannot be locked again so it does not matter at all what the values in those columns are. Usually destroy_failed_jobs is set to true and you will never see those rows in the table to begin with.
OK...I accept that. What would I have to do to get the jobs to reset those columns? I have another ticket related to hook/callbacks not getting processed/invoked on custom jobs https://github.com/collectiveidea/delayed_job_active_record/issues/65
Why do you need the columns reset?
@albus522 Let me tell you a little story.
One beautiful fall morning you take your miniature bulldog, JuniorBeef, on his daily walk to the local dog park. You're walking down the street enjoying the crisp autumn air when suddenly JuniorBeef jerks the leash out of your hand and takes off down the street day at his maximum speed (10mph) towards a woman walking the opposite direction carrying an arm full of groceries (containing at minimum one baguette protruding from the top). JuniorBeef startles her and she promptly spills the contents all over the ground. As you rush over to help pickup the groceries, you both bend over at the same time and you both bonk heads and have an awkward eye-locking moment while apologizing for your crazy dog. She says it's OK and that JuniorBeef is a "cute little guy". You introduce yourself and she does the same. Her name is Scarlett Johansson. You ask her out to dinner and she accepts.
Fast-forward, you and Scarlett Johansson promptly fall head-over-heals in love, spending nearly every waking minute together...finishing each others sentences, sharing milkshakes, eating spaghetti in back alleyways while italian men play the accordion, etc. Scarlett Johansson is everything you ever wanted, but let's face it... she's a little corny. She decides that the perfect way to show the world how much you love each other is to wear matching t-shirts ( she got the idea when she went to china this last summer). She gets two t-shirts printed, one says loved_by: Scarlett Johansson and the other says loved_by: albus522 . She wants you to wear yours all the time and gee golly, you want to make her happy so you oblige.
One year later, things have gone cold and you've gotten bored. She's always gone filming movies, and though she says she's still loyal, you begin to resent the Gos hanging around all the time. One night, feeling lonely and having had too much to drink, you invite your ex-girlfriend Zoe Deschanel over to "talk". Zoe comes over and you catch up over a bottle of wine. Zoe has a little too much and, being a decent guy, you put her to bed so she can sleep it off. Just then, Scarlett comes home early from filming some superhero movie, sees Zoe in your bed and gets the wrong idea. Before you can explain she does a flying dragon kick into your chest and runs out the door crying.
6 months later, she still won't take your calls and has her bodyguard drop off all your stuff. Buried in the box of your possessions, under photo-booth pictures and movie stubs, is a t-shirt.. the t-shirt she made for you. You look at the shirt and start to weep... you put it to your nose and inhale deeply... it still smells like her. You want to put it on, but you can't bring yourself to do it... because you're not loved_by Scarlett Johannsen...not anymore
I'm not sure if this is a bug, but it looks suspicious. Below is a snapshot of the job table -- when an entry has failed at least once, you end up with a situation where one PID holds multiple jocks. E.g., PID 35964 holds a lock on ID 0551 (failed once) and ID 0567 (zero failures).
Is this expected behavior?