ClosureTree / closure_tree

Easily and efficiently make your ActiveRecord models support hierarchies
https://closuretree.github.io/closure_tree/
MIT License
1.84k stars 239 forks source link

Trying to add 2 nodes simultaneously to closure_tree table results in a deadlock #240

Open amitsaxena opened 7 years ago

amitsaxena commented 7 years ago

When two nodes are added simultaneously to the closure tree table, this results in a deadlock and the query to acquire lock runs infinitely:

  (0.7ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t46e4b4acfd42770bad34d3e2a13a458a
   (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t9eaf21f51a8c5a1f2e39f6e8c4971b27
   (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS tae827dc66e71e92f4c9f6edecae1cb54
   (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t7f7d47e378ffde7f311e56fde6613ab5
   (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS ta7af1b261b90cba9e4d7399c96f17e26
   (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t3ed658e2e0d3008375e7356cc8e1c4a2
   (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t080acfd6e5536c50c4917ce271671be6
   (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t850056c6b58f75d8171bdcb69def558a
   (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t1ece95b52215d7c5e420ea9c0057aa72

You can reproduce the problem easily by using the code below (where List is the closure_tree model):


Thread.new do
  parent = List.find(86)
  new_node = List.new(:name => "test deadlock 1")
  parent.add_child(new_node)
end
Thread.new do
  parent = List.find(82)
  new_node = List.new(:name => "test deadlock 1")
  parent.add_child(new_node)
end

We faced this problem on production, and the infinite locking loop results in mysql being unresponsive, which in turn results in unresponsive app.

closure_tree version: 6.2.0 ruby: 2.3.0 rails: 5.0.0.1

tomaadland commented 7 years ago

I have just discovered a similar problem in production on Postgresql. Have not been able to track down what is causing it, but I discovered an infinite loop of

SELECT pg_try_advisory_lock(1597746110,0) AS t57b1486ecc3691a98e06a13406a11286

in my log.

Database is unresponsive until I restart the app.

closure_tree version: 6.0.0 ruby 2.2.0 rails 4.2.5

mceachen commented 7 years ago

Yikes. Thanks for reporting. Can you see if reverting #234 fixes the issue? That's the most recent update to locks.

amitsaxena commented 7 years ago

Reverting that commit doesn't seem to fix this.

tomaadland commented 7 years ago

Same here.

I had to revert back to version 6.0 in production, but the problem still appears though much less frequent with version 6.0 then 6.2.

I'm trying to find time to write tests to try and reproduce this problem locally.

DuaneCompton commented 7 years ago

Can confirm, this issue has been around for a while. 6.2 seems to have increased the frequency from a monthly to hourly occurrence. Ruby 2.3.1 Rails 4.2.7.1 Closure tree 6.2 PostgreSQL.

mceachen commented 7 years ago

Are all you using postgres? What version?

mceachen commented 7 years ago

FWIW, I've been looking through the diffs between 6.0.0 and 6.0.2. Nothing immediately nefarious jumps out. https://github.com/mceachen/closure_tree/compare/v6.0.0...master

tomaadland commented 7 years ago

We're using: PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.9.1-16ubuntu6) 4.9.1, 64-bit

DuaneCompton commented 7 years ago

I am killing the lock by hand. SELECT * FROM pg_locks; SELECT pg_terminate_backend(\'\') the lock does not respond to a pg_kill_backend when in this state. I am downgraded to 6.1. In ~10,000 inserts I had 1 lockup. In 6.2 it was occurring almost immediately. We are also on PostgreSQL 9.4.5 AWS hosted.

mceachen commented 7 years ago

Thanks for the additional info.

@DuaneCompton did you have 1 lockup/10k inserts with v6.1?

DuaneCompton commented 7 years ago

@mceachen There are 6 threads running that are simultaneously under load. under v6.1 I am averaging 1 lockup/10k inserts. In v6.2 it locks basically immediately. Might be two separate issues at play. I know under v6.0.0 we experienced intermittent lock-ups as well.

amitsaxena commented 7 years ago

We use MySql, therefore I don't think this is a postgres specific issue.

DuaneCompton commented 7 years ago

@mceachen Anything I can do from a bug gathering standpoint to help troubleshoot this issue? We still run into a lock up on a weekly basis on 6.0. I am probably going to write a little script to free the lock if the table locks for longer than N minutes. But, before I do so is there anyway I can help?

mceachen commented 7 years ago

Can you capture the current running queries? Without a test to replicate the issue on demand, any change we make will be a random shot in the dark.

DuaneCompton commented 7 years ago

@mceachen will do.

tomaadland commented 7 years ago

Has anyone had any more luck in tracking down this issue ?

I've tried to provoke this issue by creating multiple records in parallell in separate threads as was mentioned initially by @amitsaxena. But no luck so far. I've forked off 6 threads each creating a 1000 records. Tried on both 6.2 and 6.0.

amitsaxena commented 7 years ago

@tomaadland are you unable to reproduce the issue? I was consistently getting the bug as soon as I execute the code above.

tomaadland commented 7 years ago

@amitsaxena as soon as I have some time on my hands i'll create a public project with an example model similar to one I have in production with the tests. I'll post it here to let you know.

DuaneCompton commented 7 years ago

@mceachen So, I've noticed when we hit this state there is really just one lock that is hanging around. Is there any reason you could see that we couldn't use the timeout in your with_advisory_lock gem to at least an infinite lock? I was also curious about this line: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L82 Any reason to not expire the previous lock before taking a new one?

mceachen commented 7 years ago

I was also curious about this line: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L82

with_advisory_lock is smart enough to no-op if a currently-acquired lock is requested again: See https://github.com/mceachen/with_advisory_lock/blob/master/lib/with_advisory_lock/base.rb#L42

If we auto-expire the rebuild lock, the rebuild isn't guaranteed to be consistent.

The deadlock is either due to a bug in the RDBMS, or, much more likely, a circular dependency due to transaction boundaries.

For example, if one of your data mutation code paths gets lock A then lock B, and another gets B then A, you'll regularly see deadlocks.

(edited to add link to nested lock no-op)

DuaneCompton commented 7 years ago

So, its not a dead lock per se. If I move around the locking obtain/release in just the critical sections of the write/delete/children call then it will be detected as a deadlock. What is actually happening is: writer 1 obtains a lock in the context of a rebuild!. writer 2 calls add_child (which I believe is going to trigger a rebuild). writer 2 spins for a while attempting to obtain the lock in which case writer 1 is paused. writer 2 fails to obtain the lock and writer 1 continues writing. repeat with writer 3...N and writer 1 basically never makes it through a rebuild. What I am not knowledgable enough to know is why writer 1 does not continue writing when it has already obtained the lock.

mceachen commented 7 years ago

Can you maybe point to the code, or talk more about what's going on in writer 1 and 2?

Deadlocks I've seen in the past occur when there are multiple code paths that have transaction blocks that edit models in different order.

DuaneCompton commented 7 years ago

Sorry, its not a deadlock in the traditional sense, its actually worse since a deadlock will be detected and aborted. In this case, the first process has obtained the lock and is proceeding on its way when the second process comes in and attempts to obtain a lock. It fails since writer one already has the a lock (which is the correct behavior IMO). What I am having trouble understanding is why writer 1 hangs until writer 2 gives up attempting to obtain its lock.

If I run something like:

  parent = Tag.create(name: 'lock 1')
  (1..100).each do |i|
        p "adding #{i}"
        new_node = Tag.create(:name => "test deadlock #{i}")
        ActiveRecord::Base.transaction do
             parent = parent.add_child(new_node)
        end
end

on two separate machines upon starting the second both will shortly arrive in a semi-deadlocked situation. The script that I started the call on will then eventually expire and the first writer will continue. I've tried moving around the transaction/without it. Below is one potential outcome.

Writer 1 :

D, [2017-02-16T07:31:34.537506 #21545] DEBUG -- :   Tag Exists (0.4ms)  SELECT  1 AS one FROM "tags" INNER JOIN "tag_hierarchies" ON "tags"."id" = "tag_hierarchies"."ancestor_id" WHERE "tag_hierarchies"."descendant_id" = $12 AND "tags"."id" = $2 LIMIT 1  [["descendant_id", 7903], ["id", 7905]]
W, [2017-02-16T07:31:34.537919 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:27:in '_ct_validate'

D, [2017-02-16T07:31:34.541672 #21545] DEBUG -- :   SQL (1.6ms)  UPDATE "tags" SET "parent_id" = $1, "updated_at" = $2 WHERE "tags"."id" = $3  [["parent_id", 7903], ["updated_at", "2017-02-16 07:31:34.538876"], ["id", 7905]]
W, [2017-02-16T07:31:34.542309 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/model.rb:138:in 'add_child'

D, [2017-02-16T07:31:34.545266 #21545] DEBUG -- :    (0.5ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t2ea0208ee2526226ecc8d65da8f4c493
W, [2017-02-16T07:31:34.546018 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'
D, [2017-02-16T07:31:34.646474 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS ta2ef67039c80d539454f14205900baac
W, [2017-02-16T07:31:34.647598 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 
'execute_successful?'

D, [2017-02-16T07:31:34.785180 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS tafb0ba91177bd447d53f97cae6f97ae0
W, [2017-02-16T07:31:34.786015 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:34.910538 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t878276a11847c5bb14ee22e966d51222
W, [2017-02-16T07:31:34.911517 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.014669 #21545] DEBUG -- :    (1.2ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t3996341ab230c95a3222cf08044a7593
W, [2017-02-16T07:31:35.015517 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.098349 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS td67fe931e1e97081594bf1d3b7b577ac
W, [2017-02-16T07:31:35.098631 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.210146 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS td2b2a4adf0f2fb2e61565b1a8d58da8b
W, [2017-02-16T07:31:35.210534 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.351044 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t6ba2e7b92cdaf2d1905f1a6214e7efa2
W, [2017-02-16T07:31:35.351374 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 
'execute_successful?'

D, [2017-02-16T07:31:35.429454 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t8388cb7c01e7341bd1f63e3404fe8e9b
W, [2017-02-16T07:31:35.429904 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.520427 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS tb6c70aa0fe9699434e66857ba96e1c7d
W, [2017-02-16T07:31:35.520766 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.616061 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS taef48c46023f84d4e4c6454fce8dfed7
W, [2017-02-16T07:31:35.616458 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.756555 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t14f7ff2ec5dae5c9961188d75d2f0ea7
W, [2017-02-16T07:31:35.756914 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.829528 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t363a59f8e61d4ce5c0fcfc5abf6e8404
W, [2017-02-16T07:31:35.829871 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.978112 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t57ace053678ffafac925ebec1a163abe
W, [2017-02-16T07:31:35.978513 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:36.053677 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t809dd7878f94941becb585a2ee937e3e
W, [2017-02-16T07:31:36.053976 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

Writer 2

D, [2017-02-16T07:31:34.457059 #21548] DEBUG -- : DEBUG: Chewy strategies stack: [2] <- bypass @ deadlock.rb:1
D, [2017-02-16T07:31:34.465650 #21548] DEBUG -- :    (0.5ms)  BEGIN
W, [2017-02-16T07:31:34.465916 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:31:34.488118 #21548] DEBUG -- :   SQL (0.8ms)  INSERT INTO "tags" ("name", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["name", "lock 1"], ["created_at", "2017-02-16 07:31:34.485771"], ["updated_at", "2017-02-16 07:31:34.485771"]]
W, [2017-02-16T07:31:34.488464 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:31:34.539682 #21548] DEBUG -- :    (0.5ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t9f2a3d3d5ce591e0cbcad87a06e46c26
W, [2017-02-16T07:31:34.539915 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'

D, [2017-02-16T07:31:34.601380 #21548] DEBUG -- :   SQL (0.7ms)  INSERT INTO "tag_hierarchies" ("ancestor_id", "descendant_id", "generations") VALUES ($1, $2, $3)  [["ancestor_id", 7904], ["descendant_id", 7904], ["generations", 0]]
W, [2017-02-16T07:31:34.601783 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:66:in `block in rebuild!'

D, [2017-02-16T07:32:34.608471 #21548] DEBUG -- :    (60004.6ms)  UPDATE "tags"
SET "position" = t.seq + -1
FROM (
  SELECT "id" AS id, row_number() OVER(ORDER BY position) AS seq
  FROM "tags"
  WHERE "parent_id" IS NULL 
) AS t
WHERE "tags"."id" = t.id

W, [2017-02-16T07:32:34.609206 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/numeric_order_support.rb:39:in `reorder_with_parent_id'

D, [2017-02-16T07:32:34.613072 #21548] DEBUG -- :    (1.4ms)  SELECT pg_advisory_unlock(1814493058,0) AS t5a0d17d7a9f5b664b7d1f87c2252d30c
W, [2017-02-16T07:32:34.613436 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'

D, [2017-02-16T07:32:34.614966 #21548] DEBUG -- :    (0.4ms)  ROLLBACK
W, [2017-02-16T07:32:34.615220 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:32:34.616004 #21548] DEBUG -- : DEBUG: Chewy strategies stack: [2] -> bypass @ deadlock.rb:1
/usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `async_exec': PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block (ActiveRecord::StatementInvalid)
: SELECT pg_advisory_unlock(1814493058,0) AS t5a0d17d7a9f5b664b7d1f87c2252d30c
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `block in exec_no_cache'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract_adapter.rb:484:in `block in log'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract_adapter.rb:478:in `log'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `exec_no_cache'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:584:in `execute_and_clear'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:50:in `select_value'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:9:in `release_lock'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:81:in `ensure in yield_with_lock'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:81:in `yield_with_lock'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:65:in `yield_with_lock_and_timeout'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:48:in `with_advisory_lock_if_needed'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/concern.rb:16:in `with_advisory_lock_result'
    from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/concern.rb:10:in `with_advisory_lock'
    from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/support.rb:103:in `with_advisory_lock'
    from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:63:in `rebuild!'
    from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:39:in `_ct_after_save'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:432:in `block in make_lambda'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:228:in `block in halting_and_conditional'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `block in call'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `each'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `call'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:92:in `__run_callbacks__'
    from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:778:in `_run_save_callbacks'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/callbacks.rb:302:in `create_or_update'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/persistence.rb:120:in `save'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/validations.rb:37:in `save'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/attribute_methods/dirty.rb:21:in `save'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:286:in `block (2 levels) in save'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:351:in `block in with_transaction_returning_status'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `block in transaction'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/transaction.rb:184:in `within_new_transaction'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:220:in 'transaction'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:348:in 'with_transaction_returning_status'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:286:in 'block in save'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:301:in 'rollback_active_record_state!'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:285:in 'save'
    from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent/instrumentation/active_record_helper.rb:26:in 'block in save'
    from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent.rb:586:in 'with_database_metric_name'
    from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent/instrumentation/active_record_helper.rb:25:in 'save'
    from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/persistence.rb:34:in 'create'
    from deadlock.rb:3:in `block in <top (required)>'
    from /usr/local/bundle/gems/chewy-0.8.4/lib/chewy/strategy.rb:60:in 'wrap'
    from /usr/local/bundle/gems/chewy-0.8.4/lib/chewy.rb:179:in 'strategy'
    from deadlock.rb:1:in '<top (required)>'
    from /usr/local/bundle/gems/zeus-0.15.3/lib/zeus/load_tracking.rb:50:in 'load'
    from /usr/local/bundle/gems/zeus-0.15.3/lib/zeus/load_tracking.rb:50:in 'load'
    from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/runner.rb:60:in '<top (required)>'
    from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:123:in 'require'
    from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:123:in 'require_command!'
    from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:90:in 'runner'
    from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:39:in 'run_command!'
    from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands.rb:17:in `<top (required)>'
    from bin/rails:9:in `require'
    from bin/rails:9:in `<main>'
mceachen commented 7 years ago

And by 👍 I mean 😭. I'll see if I can wire up a test to do this.

DuaneCompton commented 7 years ago

Thank you. I will see what I can do as well. I do not believe at this time that its an a/b b/a situation, because I think the lock acquisition time is much higher than the deadlock check timer on the db. I'm wondering if maybe calling a complete table lock might be better in the rebuild! function rather than calling an additional lock for each delete/child insert (maybe I am misunderstanding how those locks work though).

mceachen commented 7 years ago

I think a table lock (which I had dismissed as being too coarse a mutex) is an interesting alternative. I'll code up that as an option.

dsounded commented 7 years ago

Have almost the same problem(after upgrading to Rails 5), with Rails 4.2.5 it worked well

I have accepts_nested_attributes_for my hierarchy model, and in that model I also have accepts_nested_attributes_for for its children

And after saving the parent item I got deadlock

Tried versions 6.2 and 6.4 and 6.0

dsounded commented 7 years ago

I thinks it's more about Rails update than closure_tree update

mysql Ver 14.14 Distrib 5.5.44 gem mysql2 version 0.3.19

I am playing with it, with_advisory_lock: false maybe solves the problem, but not sure it this is not temporary

DuaneCompton commented 7 years ago

@mceachen It appears the issue I am running into is specifically (and I am guessing amitsaxena as well) around the numeric reordering (https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L76) when calling add_child. Writer 1 will hang on grabbing https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L63

while writer 2 will attempt to write into: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/numeric_order_support.rb#L39

If you call directly children.create() it will avoid this issue, though I would guess any concurrent movement of leafs with numeric ordering enabled could result in this.

If I replace the with_advisory_lock from support.rb with just a model_class.lock(true) the deadlock appears to go away.

I did also note that if you attempt a destroy_all on a large tag set > 10000 (with our without numeric ordering) the destroy will fail with fatal: exception reentered.

Hope this is helpful.

mceachen commented 7 years ago

@DuaneCompton that is super helpful, and shouldn't be hard to write a breaking test. Do you want to have a go?

DuaneCompton commented 7 years ago

I will give it a shot. Thanks.

swrobel commented 7 years ago

@DuaneCompton any progress here? I just ran into this issue.

mceachen commented 7 years ago

Thanks to @serenaf, she highlighted the MySQL docs that reveal v5.7.5 fundamentally changed how get_lock() works. Fixing that issue should resolve this issue for MySQL users, at least.

mceachen commented 7 years ago

@uhlenbrock this issue (which seems to be with_advisory_lock not playing nicely with newer versions of postgres, mysql, and rails) is currently the most pressing for closure_tree. More eyes on it are welcome.

uhlenbrock commented 7 years ago

@mceachen I have started to take a look. I have two PRs open on with_advisory_lock to get the testing infrastructure ready to replicate this issue.

tomaadland commented 7 years ago

@uhlenbrock Any ideas on how to replicate this in a test ?

uhlenbrock commented 7 years ago

@tomaadland I have started work over here: https://github.com/ClosureTree/with_advisory_lock/issues/17

uhlenbrock commented 7 years ago

I am working to build a failing test based on the suggestion above:

describe Tag do
  context 'in a race condition' do
    it 'should not break the database' do
      parent1 = Tag.create!(name: 'parent tag 1')
      parent2 = Tag.create!(name: 'parent tag 2')

      Thread.new do
        new_node = Tag.new(name: 'test deadlock 1')
        parent1.add_child(new_node)
      end
      Thread.new do
        new_node = Tag.new(name: 'test deadlock 1')
        parent2.add_child(new_node)
      end
    end
  end
end

Via DB=postgresql appraisal activerecord-5.1 rake spec, the following error is raised:

SystemStackError:
       stack level too deep

I believe this is because of Thread.abort_on_exception = true. The test does not raise an error with this line commented out, but also takes between 10 and 20 seconds. Definitely enough to upset your production database. I also found that it is true that calling create instead of new avoids any issue.

Does this appear to adequately reproduce the issue?

tomaadland commented 7 years ago

I tried running this test in the gem and I get the SystemStackError

When I created a similar unit test in the app that is using this gem and when calling create! instead of new for creating new children instances to be added I get postgresql to err on deadlock.

ERROR: deadlock detected at character 8

So in my case the

new_node = Tag.create!(name: "test deadlock 1")

causes a deadlock.

tomalok commented 6 years ago

I may have found a workaround that works for me -- though at present I'm not sure of what other consequences may arise from my solution.

I'd long since upgraded the database to MariaDB 10.x from some older MySQL without apparent incident. Until one day when I needed to reparent something, and then...

Mysql2::Error: Lock wait timeout exceeded; try restarting transaction

At the time, i wrote an awful script that monkey-patched the part of closure_tree that dealt with locking (IIRC), and managed to get my reparenting done.

Now, however, I'm going to need to reparent more often, so I took some time today to do a little more research with a simple test script...

#!/usr/bin/ruby

$LOAD_PATH.unshift '/idw/lib'
Dir.chdir('/idw') do
  require 'rubygems'
  require 'bundler'
  Bundler.require(:default)   # TODO: split this out to just what's needed
  require '/idw/config/environment'
end

require 'pp'

# connect to the right database for the curent environment
IDW::Base.establish_connection(YAML.load(File.read('/idw/config/database.yml'))[ENV['IDW_ENV']])
IDW::Base.connection_pool.with_connection do

  hc = IDW::Hostclass.find_by(name: 'NS_OPMS')
  pp hc.self_and_ancestor_ids

  phc = IDW::Hostclass.find_by(name: 'NS_ORCAWAVE')
  pp phc.self_and_ancestor_ids

  phc.children << hc
  pp hc.self_and_ancestor_ids

end

Which, when run... Failed:

$ ./reparent
D, [2018-03-09T21:57:01.685550 #148] DEBUG -- :    (0.3ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
D, [2018-03-09T21:57:01.739543 #148] DEBUG -- :    (0.2ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
D, [2018-03-09T21:57:01.750891 #148] DEBUG -- :   IDW::Hostclass Load (0.4ms)  SELECT  `hostclasses`.* FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_OPMS' LIMIT 1
D, [2018-03-09T21:57:01.773695 #148] DEBUG -- :    (0.7ms)  SELECT `hostclasses`.`id` FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 214 ORDER BY `hostclass_hierarchies`.generations asc
[214, 199, 2]
D, [2018-03-09T21:57:01.774403 #148] DEBUG -- :   IDW::Hostclass Load (0.4ms)  SELECT  `hostclasses`.* FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_ORCAWAVE' LIMIT 1
D, [2018-03-09T21:57:01.776208 #148] DEBUG -- :    (0.6ms)  SELECT `hostclasses`.`id` FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 215 ORDER BY `hostclass_hierarchies`.generations asc
[215, 199, 2]
D, [2018-03-09T21:57:01.776909 #148] DEBUG -- :    (0.3ms)  BEGIN
D, [2018-03-09T21:57:01.781243 #148] DEBUG -- :   IDW::Hostclass Exists (0.5ms)  SELECT  1 AS one FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 215 AND `hostclasses`.`id` = 214 LIMIT 1
D, [2018-03-09T21:57:01.782238 #148] DEBUG -- :   IDW::Hostclass Exists (0.3ms)  SELECT  1 AS one FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_OPMS' AND (`hostclasses`.`id` != 214) LIMIT 1
D, [2018-03-09T21:57:01.783606 #148] DEBUG -- :   SQL (0.4ms)  UPDATE `hostclasses` SET `parent_id` = 215, `updated_at` = '2018-03-09 21:57:01' WHERE `hostclasses`.`id` = 214
D, [2018-03-09T21:57:01.786316 #148] DEBUG -- :    (0.3ms)  SELECT GET_LOCK('3bd772cb25a662070242f7efd5ca79ef2', 0) AS t5e93ac4ad69f0ed6f2c5c9584d8b188b
D, [2018-03-09T21:57:01.788037 #148] DEBUG -- :    (1.4ms)  DELETE FROM `hostclass_hierarchies`
WHERE descendant_id IN (
  SELECT DISTINCT descendant_id
  FROM (SELECT descendant_id
    FROM `hostclass_hierarchies`
    WHERE ancestor_id = 214
       OR descendant_id = 214
  ) AS x )

D, [2018-03-09T21:57:01.793990 #148] DEBUG -- :    (0.2ms)  BEGIN
D, [2018-03-09T21:57:52.175943 #148] DEBUG -- :   SQL (50381.2ms)  INSERT INTO `hostclass_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (214, 214, 0)
D, [2018-03-09T21:57:52.176687 #148] DEBUG -- :    (0.5ms)  ROLLBACK
D, [2018-03-09T21:57:52.177324 #148] DEBUG -- :    (0.4ms)  SELECT RELEASE_LOCK('3bd772cb25a662070242f7efd5ca79ef2') AS t72718680058c1a95838500647177a561
D, [2018-03-09T21:57:52.177853 #148] DEBUG -- :    (0.4ms)  ROLLBACK
/idw/vendor/bundle/ruby/2.4.0/gems/mysql2-0.4.10/lib/mysql2/client.rb:120:in `_query': Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: INSERT INTO `hostclass_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (214, 214, 0) (ActiveRecord::StatementInvalid)

...but provided some clues as to what was going on.

The (currently) third solution to https://stackoverflow.com/questions/6000336/how-to-debug-lock-wait-timeout-exceeded-on-mysql mentions that InnoDB's default isolation level is REPEATABLE_READ, whereas other DBs use READ_COMMITTED.

Setting my database's default isolation level to READ_COMMITTED...

SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

...did the trick and allowed the reparenting script (and another more complicated script) to work as anticipated. Not sure I'd want to leave this on globally, might be best to set it as tightly around the code as possible (before that first transaction). Maybe something like...

FooModel.transaction(isolation: :read_committed) do
  # the reparenting stuff
end

I'm not confident this solves the other two-thread deadlock issue, as that appears to be happening with non-MySQL/MariaDB databases, also.

mceachen commented 6 years ago

I hadn't thought about this before @tomalok 's post, but I believe with_advisory_lock could no-op if it could reliably determine that the current database is configured as READ COMMITTED or supports MVCC. It'd be good to get a simple breaking test in place to verify this hypothesis.

kawadhiya21 commented 1 year ago

we are facing the same issue. Any update on this?

jkraemer commented 10 months ago

I can reliably reproduce this deadlock with the following test case, which concurrently removes children from a nested set with numeric sort order. The ordering seems to be the culprit, because it tries to update siblings' sort indices which leads to an implicit row lock, which, in conjunction with the implicit row locks taken out when rails updates the parent_id on records (which happens before the advisory_lock is attempted). So a possible scenario would be:

This kind of deadlock where one writer waits for the advisory lock while the other waits for a row level lock apparently isn't detected by either MySQL or Postgres, leading to an endless wait.

The only way I found to fix this is to get the advisory lock before even updating the parent_id, i.e. by calling record._ct.with_advisory_lock { record.save } in your application code.

test case:

  it "fails to deadlock from removing siblings with sort_order" do
    skip("unsupported") unless run_parallel_tests?

    @root = Label.create!(name: 'root')
    60.times{ |i| Label.create! name: i, mother_id: @root.id }
    @root.reload

    # remove 40 random children in parallel threads
    descendant_ids = @root.descendant_ids
    ids_to_remove = descendant_ids.sample(40).shuffle
    ids_to_keep = descendant_ids - ids_to_remove
    Parallel.map(ids_to_remove, in_threads: max_threads) do |id|
      ActiveRecord::Base.connection_pool.with_connection do
        Label.find(id).update(mother_id: nil)
      end
    end

    Label.where(id: ids_to_remove).each do |l|
      assert_nil l.mother_id
      assert l.root?
      assert l.leaf?
    end
    @root.reload
    assert_equal 20, @root.children.count
    assert order = @root.children.pluck(:column_whereby_ordering_is_inferred).uniq
    assert_equal (0..19).to_a, order
  end

The test will get stuck in an endless loop, unless the call to ct_reorder_prior_siblings_if_parent_changed in the rebuild! method is commented out. It might worth thinking about the necessity of that reordering - maybe having non-consecutive sort indices isn't even an issue?