ckrintz / appscale

Automatically exported from code.google.com/p/appscale
0 stars 0 forks source link

Key-list can become inconsistent when multiple threads write #189

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This issue was initially discussed in issue 183 but it is a separate issue. 

When there are multiple threads simultaneously writing to the datastore it
is possible that keys can be lost from the key-list. This can result in
incorrect query results. Consider the example:

Threads 1 and 2 want to write to keys D and E respectively.
Thread 1 writes key D
Thread 2 writes key E
Thread 1 reads key-list and gets [A,B,C]
Thread 2 reads key-list and gets [A,B,C]
Thread 1 adds D to key list, writes back [A,B,C,D]
Thread 2 adds E to key list, writes back [A,B,C,E]
key-list now contains [A,B,C,E]

Thread 1's write was overridden by Thread 2's and as a result the D key is
lost from the key-list. This is called the "lost-update" problem, its a
classic database concurrency problem. 

In order to avoid this issue, reading the key-list and adding to it should
be an atomic operation. I think Yoshi's proposal of using a lock is a
workable solution. Depending on how performance looks we can investigate
alternative approaches (e.g. Raj's favorite optimistic concurrency control)
as necessary. 

Original issue reported on code.google.com by jmkupfer...@gmail.com on 9 Apr 2010 at 5:05

GoogleCodeExporter commented 9 years ago
The current implementation is using thread lock.
It means there is no problem when the same PB server handles whole datastore
requests, but if there are multiple PB server (currently cassandra and 
voldemort), it
should be the problem.
We must use another locking mechanism like ZooKeeper, or optimistic way.

Original comment by yoshi...@gmail.com on 15 Apr 2010 at 7:43

GoogleCodeExporter commented 9 years ago
Would like to instead use memcached to store the lock on the meta-key.

Original comment by shattere...@gmail.com on 3 May 2010 at 8:11

GoogleCodeExporter commented 9 years ago
I implemented a mutex class which provides a distributed lock using memcached. 
The
code has been checked into the unstable branch under AppDB/memcache_mutex.py.

Original comment by jmkupfer...@gmail.com on 3 May 2010 at 10:56

GoogleCodeExporter commented 9 years ago
I changed dhash_datastore to use memcache_mutex and confirmed working correctly 
with
multi thread datastore test.

Original comment by yoshi...@gmail.com on 6 May 2010 at 12:29