erthink / libmdbx

One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.
https://erthink.github.io/libmdbx/
Other
1.16k stars 110 forks source link

Unexpected 'MDBX_PROBLEM' error when reusing cursors from read-only transactions #272

Closed mriccobene closed 2 years ago

mriccobene commented 2 years ago

I am getting occasionally exceptions (MDBX_PROBLEM) on mdbx_cursor_bind. For some reasons, I have a cursor with thread storage duration (thread_local) and I rebind it using mdbx_cursor_bind when I need to re-use it for another operation on a different transaction.

The code where the error MDBX_PROBLEM is generated is this:

https://github.com/erthink/libmdbx/blob/d01e44db0ca74724d3d6053807201dc544352c2b/src/core.c#L16468-L16474

In a my failed run the checks of this code are as follows:

  1. mc->mc_signature == MDBX_MC_LIVE => ok, the cursor has thread local duration, it will be closed at program termination (!), so no clean up code was called since the last use
  2. mc->mc_txn != 0 => it is possible: the cursor was previously used so has an old txn address
  3. mc->mc_txn->mt_signature != MDBX_MT_SIGNATURE => this condition is WRONG according to the code but it is NOT_IMPORTANT according to me: the previous transaction was destroyed, we are binding the cursor to another transaction so the memory area of the old transaction may have been reclaimed for other purposes so we are reading dirty data On some runs the memory area casually can be untouched so I get no error, sometimes on the contrary can be dirty and I got the error.

I think that the check on the mc->mc_txn is wrong here, it can be omitted.

This is the state of variables: image

I got the same on the releases: v0.11.4, v0.11.5

erthink commented 2 years ago

I will deal with your issue later, but it is reasonable to clarify the essence beforehand:

But in all these cases, the mc_signature == MDBX_MC_LIVE meas that mc_txn is non-zero and points to the live transaction with valid mt_signature == MDBX_MT_SIGNATURE.

erthink commented 2 years ago

Basically, libmdbx's cursors is allocated (malloc/free) internally, and visible outside only as a pointers. So I didn't understand the reasoning of mentioning thread_local and then about (automatic?) closing the cursor when the program termination.

As I wrote above, a binded cursor is linked to a transaction object and unlink from it on transaction termination or explicitly during cursor rebind or close.

Please provide a test case or a pseudo code of your scenario.

mriccobene commented 2 years ago

I have a simple test case now. Premise: I do not know if I am misusing mdbx API, in this case accept my apologies and please tell me when and how to use the mdbx_cursor_bind.

Try this code:

mdbx::env_managed env = ...
mdbx::cursor_managed cursor;

{
    mdbx::txn_managed txn1 = env.start_read();   // read-only!
    auto map = txn1.create_map(...);
    cursor.bind(txn1, map);
    [[maybe_unused]] auto data = cursor.find(mdbx::slice(...), false);
} // txn1 closed here

[[maybe_unused]] auto dummy = new char[4096];    // hopefully avoid that txn2 will be created at the same address of old txn1

{
    mdbx::txn_managed txn2 = env.start_read();   // read-only!
    auto map = txn2.create_map(...);
    cursor.bind(txn2, map);                      // ERROR HERE!
    [[maybe_unused]] auto data = cursor.find(mdbx::slice(...), false);
} // txn2 closed here

If the second transaction, txn2, will be allocated at a different address than txn1, in the second cursor bind you will see that the cursor has a member mc_txn pointing to the old, destroyed, txn1 so it is a dangling pointer. So when the code that I quoted in my first post is ran, the dangling pointer is used and likely an error is raised.

These are the variables the debug sees in the first and the second bind:

image

erthink commented 2 years ago

Seems a bug, I'ill check/fix.

erthink commented 2 years ago

Briefly, this commit fixes a missed flaw:

mriccobene commented 2 years ago

Thanks! My test case now works.

erthink commented 2 years ago

Thank you for reporting.