Add support for custom max_readers

mspinaci commented 9 months ago

Hello,

(first of all, thanks for this library!)

I've been struggling with trying to use many concurrent workers reading from a database using this library. lmdb has a default parameter of 126 max_readers. Indeed, when I tried to use 128 concurrent workers, I got the expected error

mdb_txn_begin: MDB_READERS_FULL: Environment maxreaders limit reached

I saw that lmdbm doesn't expose the max_readers parameter, so I tried subclassing the Lmdb class to expose it in the open classmethod; however, when trying to set it to more than 126 and use more than 126 concurrent workers, I get the error:

mdb_txn_begin: Invalid argument

Remark that it's really an "and": if I set max_readers to whatever value > 126, but only use 126 concurrent workers or fewer, the error doesn't appear. So for practical purposes, setting max_readers flag to be > 126 is only changing the error I'm getting (and minor things like the lock.mdb file size), but not the practical results.

(to be double sure I didn't mess up with subclassing, I even tried hard-coding in the lmdbm code the line:

env = lmdb.open(file, map_size=map_size, max_dbs=1, readonly=True, create=False, mode=mode, max_readers=256)

but I got the exact same error)

Am I missing something in lmdb(m) implementation, or is there a bug somewhere? I wasn't able to track down where this error is coming from exactly...

Dobatymo commented 9 months ago

Hi @mspinaci I don't have time at the moment to investigate this completely, but according to the lmdb docs

max_readers: Maximum number of simultaneous read transactions. Can only be set by the first process to open an environment, as it affects the size of the lock file and shared memory area. Attempts to simultaneously start more than this many read transactions will fail.

Did you make sure every last process was closed before trying the change?

mspinaci commented 9 months ago

Many thanks for the quick answer! Yes I think I have always started the code from a "fresh" python interpreter (restarting it every time). I'm also leaving for vacations right now so I can't do further testing at the moment, I'm sorry :-)

(also, for the moment, the default 126 readers is fine for my specific task, although in the future I might need to scale to more; so no worries at all if you don't have time!)

Dobatymo commented 9 months ago

Also try setting autogrow to False and use an appropriate map_size. Otherwise I don't have any idea right now... What's your OS btw?

EDIT: The Python lmdb wrapper sets MDB_NOTLS https://github.com/jnwatson/py-lmdb/blob/57c692050b8d4f67ff7bcdec7acf38598de7c295/lmdb/cffi.py#L750 maybe that causes issues.

MDB_NOTLS Don't use Thread-Local Storage. Tie reader locktable slots to MDB_txn objects instead of to threads. I.e. mdb_txn_reset() keeps the slot reseved for the MDB_txn object. A thread may use parallel read-only transactions. A read-only transaction may span threads if the user synchronizes its use. Applications that multiplex many user threads over individual OS threads need this option. Such an application must also serialize the write transactions in an OS thread, since LMDB's write locking is unaware of the user threads.

Dobatymo commented 5 months ago

@mspinaci I added a way to passthrough arguments in lmdbm==0.0.6. But this probably doesn't solve your error.

Dobatymo / lmdb-python-dbm

Add support for custom max_readers #6