Use database mutex locking

Silex commented 5 years ago

Don't merge yet, work in progress.

https://github.com/ClosureTree/with_advisory_lock

An advisory lock is a mutex used to ensure no two processes run some process at the same time. When the advisory lock is powered by your database server, as long as it isn't SQLite, your mutex spans hosts.

Should fix #254

Questions:

Did I modify the Appraisals/Gemfiles correctly ?
How to handle the fact that with_advisory_lock requires ActiveRecord >= 4.2? Should we drop the old active records or do we mock the method so it does nothing?
Should we lock all methods that modify the list ? It's hard for me to reason about it because a lot of methods modify the list, and they call each others! And them some of them call class methods like :update_all at which point we lost the necessary scope information. We could do something like @locked = true and prevent multiple locking attempts.

brendon commented 5 years ago

Hi @Silex :) Thanks for digging into this further. @swanandp, could you come in on this and take a look too. I think it's quite a big change and we want to be sure it won't negatively impact anything.

I'm not a huge fan of dropping support for old ruby's and rails's if we don't need to. However, it might be time to drop 3.2 support and also 4.1 and 4.0 are easier to upgrade to 4.2 so that's not so much of an issue.

Regarding your questions:

Did I modify the Appraisals/Gemfiles correctly ?

Yes I think so :)

How to handle the fact that with_advisory_lock requires ActiveRecord >= 4.2? Should we drop the old active records or do we mock the method so it does nothing?

Let's hear from @swanandp on that.

Should we lock all methods that modify the list ? It's hard for me to reason about it because a lot of methods modify the list, and they call each others! And them some of them call class methods like :update_all at which point we lost the necessary scope information. We could do something like @locked = true and prevent multiple locking attempts.

I don't know where to start here. The current set of methods are a bit confusing at times I agree. Probably a good case for a big tidy-up.

swanandp commented 5 years ago

Thanks for your work @Silex, I will go through the whole PR, but I think the Appraisals piece should be separated out into a different PR.

You probably already know this how to use git for selectively checking out files, or may even have a better way, but it might be helpful for other readers, so:

You can branch-off master, and then checkout with-advisory-lock branch using pathspec to get the files you need:

git checkout -b new-branch-name master
git checkout with-advisory-lock -- .travis.yml
git checkout with-advisory-lock -- Appraisals
git checkout with-advisory-lock -- Gemfile
git checkout with-advisory-lock -- Gemfile.lock
# ... etc

In this case you have isolated commits, you can cherry pick them.

Silex commented 5 years ago

Alright I did two PR as you requested (see #327). This PR is now based on the other one, is that what you meant?

Should we lock all methods that modify the list ? It's hard for me to reason about it because a lot of methods modify the list, and they call each others! And them some of them call class methods like :update_all at which point we lost the necessary scope information. We could do something like @locked = true and prevent multiple locking attempts.

One approach is to use these in all public methods, eg move_higher.

That sounds like a decent strategy, but we need to take care of recursive locking (hence my @locked = true idea. I'll see what I can do.

Silex commented 5 years ago

Alright, seems like the tests are finally passing with my changes. I'm not sure wether my recursive locking strategy is needed because the gem seems to be supporting it out of the box! Except for MySQL < 5.7.5 but I think that is README material, not code.

@swanandp: can you validate that the following is correct (gets the right scope value as a string)?

https://github.com/swanandp/acts_as_list/blob/8bd2545502b2b56be7b33dabdfbb63a6b4df8739/lib/acts_as_list/active_record/acts/list.rb#L260-L272

Silex commented 5 years ago

Given the tests now fails only for MySQL, and that travis uses an old MySQL I'm trying a workaround for it to support recursive locking.

EDIT: actually it's because it asks for a different lock while holding another lock. Any idea why the test triggers this?

EDIT2: ah, ListTest#test_update_position_when_scope_changes. Yeah well, not sure how to handle this.

EDIT3: took the hammer solution and I just lock per model, which means two acts_as_list with different scopes used on the same model with be updated sequentially instead of parralel. I think this should be rare enough for it to be ok.

Silex commented 5 years ago

I am requesting a few changes. I do think locking and constraints should be left to the app layer. See #326. However, we can extract the code out into a module, and offer this as an add-on module that can be included if the user wants.

Ah, I missed this. I'll see what I can do.

brendon commented 5 years ago

Thanks again for your hard work @Silex :) Just a note, acts_as_list doesn't support more than one instance of itself on a model as far as I know. Mainly because of the way things are structured internally.

Silex commented 5 years ago

@brendon: okay so that means only the model counts, I can safely disregard the scope like in https://github.com/swanandp/acts_as_list/pull/325/commits/a6e1a82fc6993fb4c966af5f076bd7d67cbdad1c correct?

brendon commented 5 years ago

@Silex, yes as far as I'm aware. We should really move to allow more than one list on the page. Ranked-model allows it via the use of a PORO from what I can tell.

UPDATE: But for now, yes let's just scope the lock to the model.

Silex commented 5 years ago

Alright, I extracted it as a module. In order for the tests to run I included it, but I'm not sure how that will look in the final form...

Basically remove the default inclusion and then do something like this?

class MyModel < ApplicationRecord
  acts_as_list
end

MyModel.include(ActiveRecord::Acts::List::DatabaseMutexes)

Also, I'm not a big fan of the name DatabaseMutexes, any idea ? WithTableLock ?

brendon commented 5 years ago

@Silex, just to extend my other comment, it'd look good to do:

acts_as_list :position, advisory_lock: false or something like that (depending on the final naming.

Also it'd be a good idea to have:

[ ] Tests if possible. Hard to test this, but we do have a current brute-force test that can simulate locking issues. It'd be better if we could simulate a multi-threaded update issues in a way that doesn't take so long to test (the current test it very slow as it just creates a lot off records to try and get an issue).
[ ] Documentation of the new feature including how it works and how to disable it, plus link off to the other gem so people can make their minds up about turning it off.

We'll do a major version bump on this I think since it has the potential to be painful for some users.

Silex commented 5 years ago

@brendon: followed your advices, and I agree we'd find a way to test this.

I remember it was pretty easy to trigger the behavior, so I'll try to add a test with the right amount of sleep and Thread.

brendon commented 5 years ago

@Silex, good progress. Ping me once you want me to look at things again :)

Silex commented 5 years ago

@brendon: alright I managed to make a test that captures the problem... the minor detail is that my AdvisoryLock module does not fix it :sweat_smile:

First of all the test only makes sense for databases like MySQL or PostreSQL, because sqlite databases are locked for sequential writes so this does not really happen.

Here you can see the test failing:

ListAdvisoryLockTest#test_no_errors_with_lock [/data/test/test_list.rb:317]:
--- expected
+++ actual
@@ -1 +1 @@
-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+[1, 1, 1, 2, 3, 3, 3, 3, 4, 4]

So, basically the problem is the following:

before_create "add_to_list_#{add_new_at}".to_sym, unless: :act_as_list_no_update?

When you save N objects in parrallel, each calls of add_to_list_bottom figures out that they need to insert at the same current posisition X. It would not work even if we locked around add_to_list_bottom because it's a separate call from #save.

The problem with https://github.com/swanandp/acts_as_list/pull/325/commits/83ed1366b5a01588bbdb4a6de91fb66c904f745c is that we only wrap the public methods, but we need to wrap the whole before_create thing as well. I thought of using an around_save filter, but that will not work. I think the only way to make this test pass is to rewrite all the database-writing methods to use one single method, in which we can lock appropriately.

We need to go from:

before_create
- compute pos X
- move all remaining ids
save
- lock
- store position X
- unlock

to something like:

save
- lock
- compute pos X
- move all remaining ids
- store position X
- unlock

In my application, I fixed it with one wrapper used everywhere:

  def with_lock
    Album.with_advisory_lock(format('album-%d', id)) do
      yield
    end
  end

  def add_image!(hash)
    with_lock do
      images.create!(hash)
    end
  end

  def remove_image!(image)
    with_lock do
      image.destroy
    end
  end

That is, every time I want to add or remove images from my album the whole transaction is locked and I don't get duplicate ids.

I'm not sure what the right way forward is at this point. I'm afraid that fixing it for creation is just the tip of the iceberg, and that there are many other scenarios that might fail... maybe this is just too much work.

brendon commented 5 years ago

Lol yes it does sound like a lot of work! I like your idea of structuring the callbacks differently. Definitely not possible/easy to wrap the entire save process because that's more on the author's end.

If you'd like to keep plugging away at this, we'd really appreciate it. I don't currently have time to help but I can certainly give pointers and provide feedback on your code. No rush also, if you need to move on to other things for a bit we can just leave this hanging for a while. I've been doing a bit of work with ranked-model lately and that uses just one before_save callback to do everything. Definitely a better idea.

Silex commented 5 years ago

@brendon: thanks for the confirmations. I don't have time to start a giant refactor, but let's keep this as is for now, until one of us figures an easy way out :wink: Also I think the test is a good illustration of the problem to fix.

brendon commented 5 years ago

Indeed, its nice and succinct :) Definitely a good starting point :)

Silex commented 5 years ago

Continuing in #359

brendon / acts_as_list

Use database mutex locking #325