Better documentation on locks?

kethomassen commented 9 years ago

After reading the documentation on distributed locks (http://nobrainer.io/docs/distributed_locks/), I'm left confused at what it does and how it works? Can you please provide more information on what they are, how they work, how one can use it and what it can be used for?

Many thanks.

nviennot commented 9 years ago

They provide a system wide (across all the app servers) Mutex like http://ruby-doc.org/core-2.2.0/Mutex.html

nviennot commented 9 years ago

In other words, the distributed locks are useful if you want to guarantee (to some extent) some exclusivity between two concurrent execution of your app. For example to prevent races when checking for uniqueness constraints, locks are used.

The implementation of the lock is here: https://github.com/nviennot/nobrainer/blob/master/lib/no_brainer/lock.rb

kethomassen commented 9 years ago

Okay, cool.

I'm trying to use it in my application, I'll simplify it for illustration purposes about what I need. I have an application where clients are polling via AJAX a url every x seconds. I need to access the database and find the latest item in a table. If this item has a current value of false, I need to create a new one (which will expire after x seconds, handled elsewhere).

I can see problems arising here if two clients happen to poll at the exact time, maybe causing it to create two new documents (happened previously with a MongoDB backend). How can I use this lock feature to prevent this from happening?

Thanks for the outstandingly quick reply.

nviennot commented 9 years ago

Okay so here, you might want to let NoBrainer do the heavy lifting. You can use first_or_create: http://nobrainer.io/docs/persistence/#first_or_create (It uses locks under the hood)

If you have some pseudo code available, I can help you out to write some code

kethomassen commented 9 years ago

Hmm, is there a way with that method to check if it actually created or just read from the database? Problem is I need to run another thread (through EM) with a timer that sets current to false after 60 seconds.

nviennot commented 9 years ago

This is going to be racy. You are better off having a field called valid_until that you set to 60.seconds.from_now when you create the record, this way you never update the record (immutability is good!). But first_or_create won't work because it relies on uniquness validators. Here's what I suggest you do (I haven't tried the code, I hope it works):

class Item
  include NoBrainer::Document
  field :key, :type => String, :required => true
  field :valid_until, :type => Time, :required => true
end

def fetch_current_item(key)
  NoBrainer::Lock.new("item:#{key}").synchronize do
    Item.where(:key => key, :valid_until.ge => RethinkDB::RQL.new.now).first ||
      Item.create!(:key => key, :valid_until => RethinkDB::RQL.new.now + 60)
  end
end

nviennot commented 9 years ago

Edit: le -> ge and FYI: RethinkDB::RQL.new is the r variable. The reason I'm not using Time.now is to avoid issues where time is not properly synchronized across your app servers

kethomassen commented 9 years ago

Hmmm - seems like it could work however I need to perform other work after the 60 seconds are up (perform calculations and send messages via pusher, could take multiple seconds). Any suggestions?

nviennot commented 9 years ago

because it's important to be tolerant to your app servers crashing, you can do a worker type of thing:

class Item
  include NoBrainer::Document
  ...
  field :finalized, :type => Boolean

  def finalize
    # do some work
    update!(:finalized => true)
  end
end

# worker.rb:
loop do
  NoBrainer::Lock.new("worker:item").synchronize do
    Item.where(:valid_until.lt => RethinkDB::RQL.new.now, :finalized.undefined => true).each { |item| item.finalize }
  end
  sleep 10
end

nviennot commented 9 years ago

edit: added a finalized field to make sure we don't finalize over and over the same items edit: changed to finalized.undefined => true edit: changed key to "worker:item"

kethomassen commented 9 years ago

I'm currently getting this when benchmarking with high concurrency: NoBrainer::Error::LockUnavailable - Lock on jackpot:current' unavailable:. How should I handle these? It doesn't seem intuitive to throw errors if it is locked.

nviennot commented 9 years ago

That's pretty terrible. 1) if your server dies (it will) or has a flacky network connection (it will) you'll have problems. The timer will vanish, and you'll be sad. 2) The update to current => false is done without the lock being held.

LockUnavailable means that after 10 seconds (default timeout), it couldn't get a lock, because others had the lock (high contention).

FYI, you don't need eventmachine for what you are trying to do.

nviennot commented 9 years ago

Edit: you actually wait 20 seconds with the lock held (hence the lock unavailable errors) due to timers.wait. which is not what you want to do. Try to implement something like I suggested. If you deviate from my suggestions, you might want to argue your position, otherwise it's harder for me to help.

kethomassen commented 9 years ago

Though, It shouldn't be waiting 20 seconds with the lock held due to it being deferred to another thread by EventMachine. How would you suggest implementing a way to update and create a new model with current => true every x seconds?

nviennot commented 9 years ago

Though, It shouldn't be waiting 20 seconds with the lock held due to it being deferred to another thread by EventMachine.

Okay, then you should be okay. debug your code to see what's going on :)

How would you suggest implementing a way to update and create a new model with current => true every x seconds?

With the loop { ...; sleep } pattern I mentioned earlier.

kethomassen commented 9 years ago

Ok, I've fixed everything, got it running in another thread, working with multiple instances and tested crashing servers and it falls back and works on another server - sweet. Thanks so much for the absolutely outstanding help!

One more question before I close this ticket: How can I use sum/avg on associations? I have a situation where each Item belongs to a Price model which has a value with the latest_price. I want to get all items and sum them by the latest_price value in the Price model they belong to. e.g.

Item.all.sum(:price => :latest_price)

This however doesn't work. Is there a way to achieve this? I could do it manually with a loop but doesn't seem efficient - Mongoid supports this built in.

nviennot commented 9 years ago

You're welcome :)

Item.all.sum(:latest_price)

kethomassen commented 9 years ago

@nviennot This results in a value of 0 :(

nviennot commented 9 years ago

Not sure what to tell you. This test passes: https://github.com/nviennot/nobrainer/blob/master/spec/integration/criteria/aggregate_spec.rb#L26

nviennot commented 9 years ago

Write a standalone test case if you'd like me to look into this.

kethomassen commented 9 years ago

What if the model itself had a latest_price and it belonged to to models which also had a latest_price. How does it know which to query?

nviennot commented 9 years ago

Not sure what you mean. If you could show some code it'd be great, otherwise, it's hard to reason about what you mean.

kethomassen commented 9 years ago

class Price
    field :latest_price, type => Integer
end

class Item
    belongs_to :price

    field :name, :type => String
end

Item.sum(:latest_price) #?????/

Something similar to what that would look like - given an Item criteria, sum all the latest_price values from every Price model they belong to.

nviennot commented 9 years ago

You have two ways to query this data model:

1) Using a join with a reduce on the client side. Not necessarily the best way as this fetches all items and price models from the db. The sum is done on the client side:

Item.all.join(:price).map { |item| item.price.latest_price }.reduce(:+)

2) Using a join, but running everything on the DB, including the sum:

NoBrainer.run { Item.all.join(:price).to_rql.map { |item| item[:price] }.sum(:latest_price) }

Note: If a price is shared among two different items, it will be counted twice (since we are using a join), which seems desirable from your needs.

However, this querying is just weird and awkward. It would be just easier if you denormalized the latest_price to the item. If items have a copy of the latest_price in their attributes, things are much easier: Item.sum(:latest_price)

NoBrainerORM / nobrainer

Better documentation on locks? #168