collectiveidea / delayed_job

Database based asynchronous priority queue system -- Extracted from Shopify
http://groups.google.com/group/delayed_job
MIT License
4.81k stars 955 forks source link

one PID holds locks on multiple Job entries? #406

Closed rdpoor closed 11 years ago

rdpoor commented 12 years ago

I'm not sure if this is a bug, but it looks suspicious. Below is a snapshot of the job table -- when an entry has failed at least once, you end up with a situation where one PID holds multiple jocks. E.g., PID 35964 holds a lock on ID 0551 (failed once) and ID 0567 (zero failures).

Is this expected behavior?

Delayed::Backend::ActiveRecord::Job Load (2.2ms)  SELECT "delayed_jobs".* FROM "delayed_jobs" 
id:0551 attempts:1 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1399107164309113510\n locked_by:localhost.local pid:35964
id:0554 attempts:1 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '197336121'\n- 1402693634951768383\n locked_by:localhost.local pid:35934
id:0562 attempts:1 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1397174774971502343\n locked_by:localhost.local pid:35934
id:0566 attempts:1 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1395935111251767436\n locked_by:localhost.local pid:35954
id:0567 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '197336121'\n- 1401506756120438549\n locked_by:localhost.local pid:35964
id:0568 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1397174774971502343\n locked_by:localhost.local pid:35934
id:0569 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '197336121'\n- 1403457448220361621\n locked_by:localhost.local pid:35954
id:0570 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1403588814402577101\n locked_by:
id:0571 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '238828975'\n locked_by:
id:0572 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1396568823350264057\n locked_by:
id:0573 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '358957614'\n locked_by:
id:0574 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1396568823350264057\n locked_by:
id:0575 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1397174762422178683\n locked_by:
id:0576 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '197336121'\n- 1402693634951768383\n locked_by:
id:0577 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1395224048429626153\n locked_by:
id:0578 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '197336121'\n- 1400655310581590966\n locked_by:
id:0579 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '123644931'\n- 1396568823350264057\n locked_by:
id:0580 attempts:0 handler:--- !ruby/object:ETL::MyJob\nmethod: :load\nargs:\n- '197336121'\n- 1402693634951768383\n locked_by:
thoughtpunch commented 11 years ago

I have the exact same issue with custom job classes. My hypothesis is that since custom jobs (i.e. one's that have an explicit perform method defined) don't have all the nice failure handling and cleanup that the Delayed::Job base.rb provides

@albus522 thoughts?

albus522 commented 11 years ago

There was a problem with DJ not unlocking failed jobs a while back. Make sure you are running the latest delayed_job (3.0.5) and delayed_job_active_record (0.4.4)

thoughtpunch commented 11 years ago

@albus522 I've upgraded to the latest versions of both gems and I'm still getting the issue. Here's what I did to test.

Let's delay a vanilla Ruby class in the context of a Rails 3.2 app with the default Delayed::Job config options.

class AlwaysFail
   def fail_real_hard
      raise "I'm failing...yay!"
   end
end

100.times do
   failer = AlwaysFail.new
   failer.delay.fail_real_hard
end

This inserts 100 jobs into the DB that will fail upon being invoked, as they should.

However, many of the failed jobs (in this case, 47 randomly) will still have the locked_by and locked_at attributes set.

Here are the gem versions from the Gemfile.lock file

delayed_job (3.0.5)
  activesupport (~> 3.0)
delayed_job_active_record (0.4.4)
  activerecord (>= 2.1.0, < 4)
  delayed_job (~> 3.0)

Any ideas?

albus522 commented 11 years ago

What does the last_error say?

n8gard commented 11 years ago

+1. I have been observing this in my app. I've been very confused about it.

thoughtpunch commented 11 years ago

@albus522 Here is Gist with the entire failed job YAML-fied for your viewing pleasure https://gist.github.com/thoughtpunch/5817499

albus522 commented 11 years ago

"Job failed to load: uninitialized constant AlwaysFail." This is your culprit. This is considered a permanent unrecoverable failure. You will notice failed_at is set, which means this job will not get retried.

thoughtpunch commented 11 years ago

I'm fine with it failing forever, which is totally expected in this case. What doesn't make sense to me is why the hooks haven't reset the 'locked_by' and 'locked_at' columns. It makes it look like there is stall a process working on these or that they are in progress.

Am I being unreasonable here?

albus522 commented 11 years ago

A failed job cannot be locked again so it does not matter at all what the values in those columns are. Usually destroy_failed_jobs is set to true and you will never see those rows in the table to begin with.

thoughtpunch commented 11 years ago

OK...I accept that. What would I have to do to get the jobs to reset those columns? I have another ticket related to hook/callbacks not getting processed/invoked on custom jobs https://github.com/collectiveidea/delayed_job_active_record/issues/65

albus522 commented 11 years ago

Why do you need the columns reset?

thoughtpunch commented 11 years ago

@albus522 Let me tell you a little story.

One beautiful fall morning you take your miniature bulldog, JuniorBeef, on his daily walk to the local dog park. You're walking down the street enjoying the crisp autumn air when suddenly JuniorBeef jerks the leash out of your hand and takes off down the street day at his maximum speed (10mph) towards a woman walking the opposite direction carrying an arm full of groceries (containing at minimum one baguette protruding from the top). JuniorBeef startles her and she promptly spills the contents all over the ground. As you rush over to help pickup the groceries, you both bend over at the same time and you both bonk heads and have an awkward eye-locking moment while apologizing for your crazy dog. She says it's OK and that JuniorBeef is a "cute little guy". You introduce yourself and she does the same. Her name is Scarlett Johansson. You ask her out to dinner and she accepts.

Fast-forward, you and Scarlett Johansson promptly fall head-over-heals in love, spending nearly every waking minute together...finishing each others sentences, sharing milkshakes, eating spaghetti in back alleyways while italian men play the accordion, etc. Scarlett Johansson is everything you ever wanted, but let's face it... she's a little corny. She decides that the perfect way to show the world how much you love each other is to wear matching t-shirts ( she got the idea when she went to china this last summer). She gets two t-shirts printed, one says loved_by: Scarlett Johansson and the other says loved_by: albus522 . She wants you to wear yours all the time and gee golly, you want to make her happy so you oblige.

One year later, things have gone cold and you've gotten bored. She's always gone filming movies, and though she says she's still loyal, you begin to resent the Gos hanging around all the time. One night, feeling lonely and having had too much to drink, you invite your ex-girlfriend Zoe Deschanel over to "talk". Zoe comes over and you catch up over a bottle of wine. Zoe has a little too much and, being a decent guy, you put her to bed so she can sleep it off. Just then, Scarlett comes home early from filming some superhero movie, sees Zoe in your bed and gets the wrong idea. Before you can explain she does a flying dragon kick into your chest and runs out the door crying.

6 months later, she still won't take your calls and has her bodyguard drop off all your stuff. Buried in the box of your possessions, under photo-booth pictures and movie stubs, is a t-shirt.. the t-shirt she made for you. You look at the shirt and start to weep... you put it to your nose and inhale deeply... it still smells like her. You want to put it on, but you can't bring yourself to do it... because you're not loved_by Scarlett Johannsen...not anymore