chrsmithdemos / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Concurrency support for multiple processes (1 exclusive initializer / n readers) #176

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Can the designers of leveldb explain the rational behind the design decision of 
not supporting multiple processes in leveldb implementation?

The documentation clearly says, under Concurrency section that: "A database may 
only be opened by one process at a time. The leveldb implementation acquires a 
lock from the operating system to prevent misuse."

Currently I can see that when one process opens level db, it uses fcntl with RW 
lock (exclusive lock). However this is a severely limiting, as no other process 
can ever open the same database even if it wants to just inspect the database 
contents for RDONLY purposes. 

The use case for example is - one process exclusively opens leveldb database 
and fills up the database, then closes it. Then n different processes start 
reading that database.

Original issue reported on code.google.com by shri...@gmail.com on 10 Jun 2013 at 8:03

GoogleCodeExporter commented 9 years ago
On most platforms, fcntl locks are advisory. Mandatory locks require specific 
additional steps are taken (on Linux, you have to have enabled it for both the 
file system *and* the file). So, if you want to read the file directly, you can 
totally do that.

In the case of multiple read-only readers without altering the code base, you 
*could* simply copy the file for each reader. Yes, it will be inefficient 
(though not on file systems that dedupe data), but then again, so would having 
multiple leveldb processes running as they wouldn't be able to share their 
memory/buffer/etc.

So, this doesn't seem like a good feature for LevelDB to implement. If you 
really feel you need it, you can always subclass the PosixEnv and provide your 
own LockFile and UnlockFile methods whose logic suits your unique use case.

Out of curiosity... why not just use multiple threads instead of multiple 
processes?

Original comment by cbsm...@gmail.com on 4 Oct 2013 at 10:36

GoogleCodeExporter commented 9 years ago
Like in our case, where the key-value object is constructed once and remains 
static, having something like a constant access object which opens the file 
RDONLY and that does not alter the backend files, can be used concurrently in 
threads without locking and concurrently across processes does indeed seem 
appealing to me.

Original comment by johannes...@uni-duesseldorf.de on 19 Dec 2013 at 3:48

GoogleCodeExporter commented 9 years ago
Best reason not do do multiple threads is that the communications protocol to 
the primary program may not be as robust as the file-access protocol that 
leveldb uses.   in cases where a primary program isn't responding quickly, it's 
often more efficient to access the file directly, than to attempt to build in 
some sort of back-channel.

Original comment by earone...@gmail.com on 3 Sep 2014 at 1:21

GoogleCodeExporter commented 9 years ago
Since starting to work on a project using LevelDB, I have already encountered 
two cases where the lack of support for multiple concurrent readers has been a 
problem:

(1) Building an MPI program that reads input from a LevelDB

(2) I had an existing program which it was undesirable to modify, reading a 
LevelDB. I then wanted another utility to check intermediate results of the 
first program while it was still running, but the analysis I wanted to perform 
depended on the same LevelDB.

Original comment by alex.cic...@gmail.com on 3 Oct 2014 at 12:36