Question about Tasks, EnvironmentOpenFlags.NoThreadLocalStorage and MaxReaders

drweeto commented 4 years ago

Please can I check whether I am reaching the right conclusions after reading #44?

You should be able to use MDB_NOTLS if you can guarantee that one read TXN can only be used by 1 thread at a time.

And the reply from buybackoff, it sounds like, if a program uses tasks (rather than threads directly) and meets the condition above, the LMDB environment should be initialized with EnvironmentOpenFlags.NoThreadLocalStorage (MDB_NOTLS).

Is this correct?

I have been getting MDB_READERS_FULL errors and it seems like using EnvironmentOpenFlags.NoThreadLocalStorage should resolve them. I could increase MaxReaders, of course, but that seemed like it could just be papering over the cracks.

If there is likely to be some other reason entirely, I can provide more information about my usage of the library.

AlgorithmsAreCool commented 4 years ago

Yes, to my reading using tasks should be ok under the conditions you have laid out.

To be pedantic, if you are doing something like

Example 1

async Task DoWork()
{
    await asyncThing1();

    using(var txn = environment.BeginTransaction())
    {
    //..do things
    }

    await asyncThing2();
}

This pattern is completely fine, since no async transition happens during the transaction. This usage also includes running LMDB operations using Task.Run or on the threadpool.

If you are doing something like :

Example 2

async Task DoWork()
{
    using(var txn = environment.BeginTransaction())
    {
        await asyncThing1();
        //..do things
        await asyncThing2();
    }//txn.Dispose() might be called on another thread here
}

Then (as you have already researched) EnvironmentOpenFlags.NoThreadLocalStorage must be set to do this safely.

Some basic questions to ask,

Are you sure that your transactions are always being disposed? A using statement is the easiest way to do this. Further more, I would avoid the using var pattern from C# 8 in async methods since it might leave the transaction open longer than you need it to be.
How busy is your application? The default number value of MaxReaders is 126, is it possible you are hitting this during normal operations?

drweeto commented 4 years ago

@AlgorithmsAreCool, thank you for the response.

My use is much like the first example and I use the "classic" using statement for all transactions. I think I need to add some more explanation and can provide some observations after further testing. As ever, corrections and comments are very welcome.

The following assumes usage per example 1 by @AlgorithmsAreCool.

If my conclusions are correct, then probably all this is part of learning about Lightning.NET and lmdb. But it may be useful for someone in future. For my program/scenario, it boils down to setting MaxReaders nice and high, which should not cause problems (https://twitter.com/armon/status/534867803426533376):

I have a few timer tasks that periodically use lmdb. Each time a timer task runs, it can use a different thread. As far as I can tell, by surrendering control of which thread is used to read the memory-mapped file, I cannot be sure how many different readers there will actually be.

After trying some scenarios, my observations are in the table below. For these scenarios, I created the required number of auto-resetting Timers.Timer objects with low intervals. The Timer.Elapsed handler made calls to a memory-mapped file. I deleted lock.mdb in between tests (as far as I can tell, the lock file is not made smaller if MaxReaders is reduced, but it is expanded as needed). Thread on which task was running identified using Thread.CurrentThread.ManagedThreadId.

What does enabling NoThreadLocalStorage/MDB_NOTLS do: > MDB_NOTLS : tie reader locktable slots to MDB_txn objects instead of to threads

Scenario	NoThreadLocalStorage	MaxReaders	Timers/Tasks	Threads	Behaviour
1	Disabled	1	1	1	Same single task always executed on same thread. Ok.
2	Disabled	1	1	2+	At least two different threads are used to execute the same task. Only first thread to use readers table works. Others get MDB_READERS_FULL.
3	Enabled	X	X	Y	For X >= 1, Y >= 1: Number of tasks equals MaxReaders. Tasks executed on any thread available. Ok.
4	Enabled	Y	X	X	For X > Y: Number of tasks equal to number of threads. More tasks than MaxReaders. Threads contend for access to readers table. Likely to result in MDB_READERS_FULL.

Assuming that a timer process will always run on the same thread is risky and rules out scenario 1 in a practical sense.

For the same reason, scenario 2 is also risky, in theory. If tasks can run on any thread, I could always hit the MaxReaders. In practice, probably setting MaxReaders high enough would do. I guess the number of threads in the threadpool would be a ceiling, though I would have to read up on the threadpool. And I guess a typical program never gets round to using each thread in the threadpool at least once.

Scenario 3 is probably safer than scenario 2, but still relies on MaxReaders >= number of tasks. Scenario 3 behaves the way it does because:

readers are not tied to threads
regardless of the number of unique threads on which tasks are executed, the number of threads simultaneously reading the memory mapped file will never be more than the value of MaxReaders.

Although MDB_NOTLS helps a bit here in Scenario 3, I don't yet know about other consequences of using it, so I will have to learn more about that.

And scenario 4, if number of tasks > MaxReaders, we will probably have errors in a really unpredictable way, which isn't helpful.

So, if that is correct, the conclusion for me and my scenario is to set MaxReaders high.

AlgorithmsAreCool commented 4 years ago

Sorry or the delay in response, today has been very rough.

Your testing sound valid. Timers are dispatched from the threadpool so they work the same way as ASP.NET core requests. pretty much.

But as mentioned before, the LMDB default value for MaxReaders is 126, which should be enough for most apps. But it can be increased much higher than that if need be.

What does perplex me is that you are hitting max readers at all. LMDB Read transactions are so fast for most databases that it's hard to really get high levels of concurrent transactions. Now if your app is performing lots of operations in each transaction I can see things clumping up somewhat. Or if the disk is slow, that hurts too.

drweeto commented 4 years ago

@AlgorithmsAreCool, there's no rush, thank you getting back to me :)

I'm under the impression that a thread claims, and then keeps, a slot in the reader lock file, when not using the NoThreadLocalStorage flag (which is my original use; the rest of this comment is for this case):

> The slot's address is saved in thread-specific data so that subsequent read transactions started by the same thread need no further locking to proceed

If that thinking is correct, once a thread takes a slot, another thread can never use that same slot. So the speed of read transactions would not matter. If I only have 2 reader slots but the threadpool makes use of 3 different threads, only first 2 threads to read would ever be given access. The third would never be able to read. (I.e. the first two threads to read can continue to make read transactions, the third can make 0 read transactions.)

I could check the program logs to work how many different threads my program has used. It was only after we increased the load on the program that the error popped up. Though the load doesn't affect the number of tasks that use LMDB (or the work done in a transaction), perhaps it would lead to the threadpool needing to make use of a wider variety of threads.

I should be able to investigate further. (To address the disk suggestion, we are running on an SSD).

AlgorithmsAreCool commented 4 years ago

I'm under the impression that a thread claims, and then keeps, a slot in the reader lock file, when not using the NoThreadLocalStorage flag (which is my original use; the rest of this comment is for this case):

@drweeto Now that is interesting! I'm going to take a look in the LMDB source and see if i can make sense of that.

drweeto commented 4 years ago

Ha, this feels serious now! I've not been brave enough to dig that deep, will be interesting to hear what you find @AlgorithmsAreCool.

AlgorithmsAreCool commented 3 years ago

Sorry about the delay,

Careful review of the LMDB docs shows the following

Note
A transaction and its cursors must only be used by a single thread, and a thread may only have a single transaction at a time. If MDB_NOTLS is in use, this does not apply to read-only transactions.
Cursors may not span transactions.

So MDB_NOTLS only really matters for readonly transaction, that is a very important detail i had not noticed before

after some poking around in the source, I found a telling comment:

 *  Readers don't acquire any locks for their data access. Instead, they
 *  simply record their transaction ID in the reader table. The reader
 *  mutex is needed just to find an empty slot in the reader table. The
 *  slot's address is saved in thread-specific data so that subsequent read
 *  transactions started by the same thread need no further locking to proceed.
 *
 *  If #MDB_NOTLS is set, the slot address is not saved in thread-specific data.

Reviewing the code has proved challenging, but i think i understand the important parts

Looking at the code that begins a readonly transaction:

//snippet from mdb_txn_renew0 (called by both mdb_txn_renew and mdb_txn_begin)
 MDB_reader *r = (env->me_flags & MDB_NOTLS) ? txn->mt_u.reader : pthread_getspecific(env->me_txkey);

This needs a little explaining, we can see here that if MDB_NOTLS is false, we read thread local storage and get the reader slot from there (or null). IF MDB_NOTLS is true however, we try to get it from the transaction itself. This is because this code is also used by mdb_txn_renew so the transaction might still have a slot.

In the event that the reader slot is still null, we go and grab the next free slot from the table. A free slot is one that has the pid(processId) set to 0.

Now we need to look at how transactions are cleaned up to see how this matters.

//snippet from mdb_txn_end

if (F_ISSET(txn->mt_flags, MDB_TXN_RDONLY)) {
        if (txn->mt_u.reader) {
            txn->mt_u.reader->mr_txnid = (txnid_t)-1;
            if (!(env->me_flags & MDB_NOTLS)) {
                txn->mt_u.reader = NULL; /* txn does not own reader */
            } else if (mode & MDB_END_SLOT) {
                txn->mt_u.reader->mr_pid = 0;
                txn->mt_u.reader = NULL;
            } /* else txn owns the slot until it does MDB_END_SLOT */
        }
        txn->mt_numdbs = 0; /* prevent further DBI activity */
        txn->mt_flags |= MDB_TXN_FINISHED;
    }

So for readonly transactions, we first check if the transaction has a readerslot associated with it, as far as I can tell, it always should, but this method can be called twice according to the docs so 🤷‍♀️.

If the env is ⚠NOT⚠ MDB_NOTLS, then we simply null the reference out and leave the reader slot untouched. If the env is MDB_NOTLS then we wipe out the processId slot of the reader slot and then null the reference.

If I am reading all of this correctly, it means that at the end of a readonly transaction under MDB_NOTLS, the slot is unlocked and free to be used by the next readonly transaction.

So effectively, when MDB_NOTLS is set, the transaction owns the reader slot, not the thread. And if MDB_NOTLS is not set, then the thread owns the reader slot and never lets it go. Your hypothesis is correct 🎉.

drweeto commented 3 years ago

@AlgorithmsAreCool - thank you very very very much for looking into this, I really appreciate it. Your post and conclusions look good to me.

I can leave this open another week maybe, just in case there are any other comments from anyone else.

CoreyKaylor / Lightning.NET

Question about Tasks, EnvironmentOpenFlags.NoThreadLocalStorage and MaxReaders #127

Example 1

Example 2