Dead locking when using heavily the distributed lock mechanism?

zedtux commented 5 years ago

Hello @nviennot,

I'm developing an app that's receives a lot of logs.

My first try was to not use background tasks (Sidekiq), in order to ensure the order and so on, but the update was taking ~500ms after a while so I has to change this.

I made a model where I'm saving a bunch of log lines (say 20 of them), and when this model as the maximum reached, then it updates the final object with all the lines and reset the temporary model.

In order to prevent from loosing data, I'm using the Distributed Locks that nobrainer provides, but I'm doing something wrong as I'm getting deadlocks.

First thing I tried was to embed my entire job in a lock, but then I realised that some methods I was using like first_or_create (to initialise the temporary object where I'm storing the 20 lines before to update the final object) are already locking the table for me, so I changed my code so that the lock is used only when updating the temporary model. It is much better, but when I'm increasing the Sidekiq threads to 100, then I'm seeing in the logs a lot of the following lines:

[ActiveJob] [PersistLogJob] [ca7acbbd-247c-47c4-945e-906309615636] [ 105.2ms] r.table("nobrainer_locks").get( "A5DLSf6ULJTvME6H+YOp1jn5bxM=").replace {|var_62536| r.branch( var_62536.eq(nil).or(var_62536[:expires_at].lt(r.now)), {"key" => "node_action_tmp_logs:node_actions:8B6RLOyQbhI1zt", "key_hash" => "A5DLSf6ULJTvME6H+YOp1jn5bxM=", "instance_token" => "8B6lZPXmsr5QFF", "expires_at" => r.now.add(60)}, var_62536 )}

Which makes me thinking of deadlock.

Do you have some best practice in order to use the distributed locks and avoid this issue please?

nviennot commented 5 years ago

Hello Guillaume,

I think you should use atomic operations. Here's the documentation page: http://nobrainer.io/docs/atomic_ops

You can do th following steps: Use an array to store log lines in your model. Use first_or_create to fetch the model that you need Append the log line to the array in an atomic_queue block Save the model

Seeing the lock queries is fine. This is to ensure the uniqueness property of the model, required when using first_or_create. You should not have deadlocks.

Hope this helps, Nico

On Fri, Oct 4, 2019, 03:37 Guillaume Hain notifications@github.com wrote:

Hello @nviennot https://github.com/nviennot,

I'm developing an app that's receives a lot of logs.

My first try was to not use background tasks (Sidekiq), in order to ensure the order and so on, but the update was taking ~500ms after a while so I has to change this.

I made a model where I'm saving a bunch of log lines (say 20 of them), and when this model as the maximum reached, then it updates the final object with all the lines and reset the temporary model.

In order to prevent from loosing data, I'm using the Distributed Locks http://nobrainer.io/docs/distributed_locks/ that nobrainer provides, but I'm doing something wrong as I'm getting deadlocks.

First thing I tried was to embed my entire job in a lock, but then I realised that some methods I was using like first_or_create (to initialise the temporary object where I'm storing the 20 lines before to update the final object) are already locking the table for me, so I changed my code so that the lock is used only when updating the temporary model. It is much better, but when I'm increasing the Sidekiq threads to 100, then I'm seeing in the logs a lot of the following lines:

[ActiveJob] [PersistLogJob] [ca7acbbd-247c-47c4-945e-906309615636] [ 105.2ms] r.table("nobrainer_locks").get( "A5DLSf6ULJTvME6H+YOp1jn5bxM=").replace {|var_62536| r.branch( var_62536.eq(nil).or(var_62536[:expires_at].lt(r.now)), {"key" => "node_action_tmp_logs:node_actions:8B6RLOyQbhI1zt", "key_hash" => "A5DLSf6ULJTvME6H+YOp1jn5bxM=", "instance_token" => "8B6lZPXmsr5QFF", "expires_at" => r.now.add(60)}, var_62536 )}

Which makes me thinking of deadlock.

Do you have some best practice in order to use the distributed locks and avoid this issue please?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nviennot/nobrainer/issues/253?email_source=notifications&email_token=AACIQZBN5V7TVG36HUU3NG3QM3XDJA5CNFSM4I5MKOHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HPTHV2A, or mute the thread https://github.com/notifications/unsubscribe-auth/AACIQZGLA73VIDLKQ3DIZQDQM3XDJANCNFSM4I5MKOHA .

zedtux commented 5 years ago

Hello Nicolas,

Thank you very much for this doc page, I completely missed it!

Since I've opened this issue, I found another way to manage the load by creating documents for each log lines, and I made a Sidekiq job that group them together and update the final object in one shot, which reduces a lot the usage of update and solve my issue.

I would have that doc page before, I would have try it for sure!

As soon as I can, when I'll have a similar case, I'll definitely give it a try.

Let's close the issue for now.

Thank you!

NoBrainerORM / nobrainer

Dead locking when using heavily the distributed lock mechanism? #253