gabordemooij / redbean

ORM layer that creates models, config and database on the fly
https://www.redbeanphp.com
2.3k stars 279 forks source link

Support for table locking - suggestion for implementation #929

Closed Jemt closed 1 year ago

Jemt commented 1 year ago

Hi,

I was in need for table locking. I have created a custom abstraction layer on top of RedBean which allows me to modify the 10% of RedBean that I'd like to behave differently, and add in the 10% I'm missing - e.g. table locking.

I'm wondering if there is a good reason why table locking this is not supported? We do have support for row locking which is great.

In my particular case I need to create new rows in multiple tables, but only if certain criteria is met, which is determined by multiple lookups in the database. So what I need is to be able to lock the tables that needs to be manipulated in WRITE mode, and lock the tables that is being read in READ mode, ensuring that no over session can interfere with my logic while holding these locks.

My suggestion for an implementation should work with both MySQL and PostgreSQL and is based on the behaviour and types of locks found in MySQL.

How it is used - we can set READ and WRITE locks: $db->LockTablesAndPerformTransaction(array("table1 WRITE", "table2 WRITE", "table3 READ"), function() { /* do things */ });

A WRITE lock will block any other session accessing the table until the WRITE lock is released. A READ lock will block any other session trying to update the table until the READ lock is released. This ensures consistent reads for the duration of the transaction. Other sessions are allowed to read data simultaneously from a table that already has a READ lock - both with and without acquiring another READ lock. An attempt to acquire a WRITE lock on a table that is currently locked for READ will also cause the session acquiring the WRITE lock to be blocked until the READ lock is released.

Implementation for MySQL:

// We want to start a transaction and establish table locks, but:
// The use of LOCK TABLES in MySQL automatically commits an ongoing transaction.
// Starting a transaction automatically releases any active table locks held.
// We therefore use a slightly different approach for MySQL to ensure that table locks
// and the behaviour of a transaction can be achieved simultaneously. See the following
// link for more information: https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html
// Disable auto commit - WARNING: This prevents us from controlling the transaction via PDO:
// https://stackoverflow.com/questions/4072685/pdoattr-autocommit-ignores-non-transactional-insert-update
R::getPDO()->setAttribute(\PDO::ATTR_AUTOCOMMIT, 0);
// Lock tables - all locks MUST be set in a single lock statement with MySQL
R::exec("LOCK TABLES table1 WRITE, table2 WRITE, table3 READ");
// Execute transaction callback
$cb();
// Commit, unlock tables, and restore auto commit.
// In case of errors do the same, but ROLLBACK instead of COMMIT, naturally.
R::exec("COMMIT");
R::exec("UNLOCK TABLES;");
R::getPDO()->setAttribute(\PDO::ATTR_AUTOCOMMIT, 1);

Implementation for PostgreSQL:

// Start transaction and lock tables
R::begin();
R::exec("LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE");
R::exec("LOCK TABLE table2 IN ACCESS EXCLUSIVE MODE");
// ACCESS SHARE MODE, contrary to SHARE MODE, prevent other read locks
// from being upgraded to write locks while the ACCESS SHARE MODE lock is in effect.
R::exec("LOCK TABLE table3 IN ACCESS SHARE MODE");
// Execute transaction callback and commit
$cb();
R::commit();

Now, there is one major difference between MySQL and PostgreSQL. When locks are in play, MySQL requires that all tables used are locked, even when just reading, to ensure consistent reads that can't be affected by other sessions. PostgreSQL on the other hand does not. Reading from tables not locked is allowed, and this naturally can result in different results for the duration of the transaction if the same table is read multiple times. While this can be solved using the appropriate isolation level (e.g. REPEATABLE READ), this requires us to be willing to retry transactions: "Applications using this level must be prepared to retry transactions due to serialization failures." https://www.postgresql.org/docs/current/transaction-iso.html section 13.2.2. Repeatable Read Isolation Level. It would therefore be much easier to force the use of table locks for both MySQL and PostgreSQL. But if one designs an application for PostgreSQL, one might forget to lock tables used for only reading, and this mistake won't become evident until the application is moved to MySQL, or until another session interferes with a transaction and introduce inconsistencies in data. Therefore R::find, R::findOne, etc. should throw an error if table locks are in effect, and a READ or WRITE lock has not been acquired for the given bean type. R::store, R::dispense, R::trash, etc. should throw an error if table locks are in effect, and a WRITE lock has not been acquired for the given bean type. We can't do much about R::exec - it will always allow for SQL that doesn't work across different database systems, so I think we can accept that we don't guard against the use of tables not holding locks in this case.

I have implemented the approach above in my own database layer and it works pretty well. To simplify things I have made sure that LockTablesAndPerformTransaction can't be used with nested transaction. And attempt to start another transaction within LockTablesAndPerformTransaction fails.

Jemt commented 1 year ago

Please note that I'm no expect in neither MySQL nor PostgreSQL. The information I have provided should be verified and appropriate test cases should be made. I'm using the suggested implementation for MySQL and have verified, through prototyping, that the suggested approach works as expected for PostgreSQL. But formal tests have not been created.

Lynesth commented 1 year ago

Hi,

I suppose the main reason there's no implementation of LOCK TABLES in Redbean is simply because no one had a proper use case for it so far. Nowadays with transactions, there's very few reasons to use it anyway in my opinion.

I don't know exactly what you're doing, how you're doing it, or why, but you might not even need to lock your tables at all.

Now whether or not this could be added into Redbean isn't up to me though. :)

Jemt commented 1 year ago

@Lynesth I'm sure a lot of things can be accomplished using transactions today, but imaging having to perform multiple read operations from multiple different tables, and update several tables only if certain criteria from the read operations are met, then table locks makes this super easy to achieve. With transactions we must at least be prepared to handle serialization failures and retry. On the other hand table locks do pose a risk of introducing dead locks which is also a pain. I suppose there's no free meal here. Personally, for my particular use case, I would prefer using table locks to ensure consistency. It might come down to personal preference. But thanks for your input, @Lynesth. I appreciate it :-)

Jemt commented 1 year ago

I do wonder if RedBean relies on auto commit for reliable operation, though. That's the one thing that makes me nervous. RedBean probably loses the ability to reliably control transactions, but that's fine - as long as the ordinary CRUD functions work as expected.

Jemt commented 1 year ago

@gabordemooij I'd be happy to send you our database abstraction layer built on top of RedBean so you can see the actual implementation of table locking. Unfortunately I can't share it here in its entirety since only some parts of our product is open source, which doesn't include the backend.

Lynesth commented 1 year ago

I do wonder if RedBean relies on auto commit for reliable operation though. That's the one thing that makes me nervous. RedBean probably lose the ability to reliably control transactions, but that's fine - as long as the ordinary CRUD functions work as expected.

I'm not sure I understand. You can simply use transactions with Redbean if you need to using either R::transaction() or R::begin().

Jemt commented 1 year ago

I'm not sure I understand. You can simply use transactions with Redbean if you need to using either R::transaction() or R::begin().

@Lynesth Actually you can't when locking tables. Locking tables will commit a transaction, and starting a transaction will release table locks. So the only way to combine a transaction with table locks is to disable auto commit, and when you do, you can't for instance commit the ongoing transaction using PDO - it will throw this error: Uncaught exception 'PDOException' with message 'There is no active transaction'.

So at least RedBean's transaction functions won't work as expected.

Jemt commented 1 year ago

The odd behaviour described above applies to MySQL only, by the way. You can combine transactions and table locks just fine with PostgreSQL.

Lynesth commented 1 year ago

Well again, I don't know what you're doing exactly, but you could probably just use row locking.

Unless you need to retrieve tons of rows at the same time and lock them, it should be good and would allow other sessions to keep on working if they need to, instead of waiting for a table lock (that is, if you actually need to lock the rows you read in the first place).

If you're doing some sequential stuff, where you retrieve a few rows in a couple of tables, analyze them to then insert/update rows in other tables, it seems overkill to lock the tables in my opinion. At least I think it stands if you're working with a multi-users environment with concurrent queries to your database. If there's only a couple of scripts accessing your tables then it doesn't really matter I guess.


Oh and to go back to your previous comment, I think I finally understand what you meant. No, afaik Redbean doesn't need auto-commit for anything. If you don't use it then you have to commit manually that's all.

gabordemooij commented 1 year ago

I have never seen a use case for table locking since transactions have become available in MySQL but I will do some research how people use this these days.

Jemt commented 1 year ago

Thanks @gabordemooij, I appreciate that 😊

Jemt commented 1 year ago

@Lynesth In short, I need to insert new rows into two tables based on data from other tables, without the risk of introducing duplicates during the insertion process. I could certainly solve my problem by using transactions and adding UNIQUE constraints. However, I would have to handle failed transactions instead of preventing them from occurring in the first place. I think prevention is a more intuitive way of designing things. Maybe I'm old-fashioned here 😅

Lynesth commented 1 year ago

I am confused as to how this is working to be honest.

From what I understand, you seem to imply that multiple sessions could be trying to insert the same row in a table and, in order to prevent those duplicates, you plan on locking the table entirely. But how does that prevent one of your sessions from inserting the duplicate when the lock is released if you don't check for uniqueness at some point?

gabordemooij commented 1 year ago

@Lynesth In short, I need to insert new rows into two tables based on data from other tables, without the risk of introducing duplicates during the insertion process. I could certainly solve my problem by using transactions and adding UNIQUE constraints. However, I would have to handle failed transactions instead of preventing them from occurring in the first place. I think prevention is a more intuitive way of designing things. Maybe I'm old-fashioned here sweat_smile

Just for my understanding... if a transaction fails, and you do a rollback, what is there left to repair? I mean, everything is rolled back right? As if nothing ever happened... Also, if you set your transaction isolation level to "serialized" will this not effectively be the same as locking all affected tables?

https://stackoverflow.com/questions/4226766/mysql-transactions-vs-locking-tables https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html

But maybe your situation is very different from what we can imagine. Can you give us an example in code to understand how locking is preferred to transactions?

Old-fashioned is good though. :-)

Jemt commented 1 year ago

Hi guys.

Please allow be to get back to you in the weekend. I'm currently out of office. Thanks :-)

Jemt commented 1 year ago

Hello again.

I'm sorry about all the confusion. I should have provided some code to begin with. Here is an example:

$product = R::find("product", "reference = ?", array($ref));
$productId = -1;

if ($product === null)
{
    // Let's assume 10 sessions end up here at the same time

    $product = R::dispose("product");
    $product["name"] = $name;
    $product["price"] = $price;
    $product["reference"] = $ref; // Only one record with this value is allowed!
    $productId = R::store($product);

    $warehouseproduct = R::dispense("warehouseproduct");
    $warehouseproduct["productid"] = $productId; // Only one record associated with a product is allowed!
    $warehouseproduct["inventory"] = 0;
    R::store($warehouseproduct);
}
else
{
    //throw new \Exception("Product already exist");
    $productId = $product["id"];
}

return $productId;

We only allow one record in the "product" table with any given "reference" value, and only one record in the "warehouseproduct" table per product record in the "product" table.

With a UNIQUE constraint on both the "product" and "warehouseproduct" table, and the use of the "serialized" isolation level for our transaction, then yes, we can probably achieve our goal to avoid duplicates. We would need to handle a failed transaction due to serialization failure, errors caused by duplicates, and any other error that might occur.

I'm not sure whether we can implement this in a way that works across both MySQL and PostgreSQL without writing MySQL or PostgreSQL specific SQL - I would hate having to do `if (mysql) { ... } else if (pgsql) { ... } and use R::exec(..) for transaction control and for data manipulation. Also I'm not sure how to catch errors caused by the attempt to create duplicates in the tables.

In any case, being able to wrap the entire code like below is just a breeze. No need to worry about the database being used - works with both MySQL and PostgreSQL, and no need to handle concurrency errors. We do introduce the risk of deadlocks if not using this with caution, of course.


$db->LockTablesAndPerformTransaction(array("product WRITE", "warehouseproduct WRITE"), function() use($a, $b, $c)
{
    // Insert code from first example here
});
Lynesth commented 1 year ago

I'd probably do this:

Add a unique constraint on your product.reference column (it should be there anyway if it's supposed to be unique), and on warehouseproduct.productid with cascading delete if that's required (also, any reason you're not using warehouseproduct->product to store it?).

try {

    R::begin();
    $product = R::findForUpdate("product", "reference = ?", [ $ref ]);

    if (!empty($product->id)) {
        R::rollback();
        return $product->id;
    }

    $product = R::dispose("product");
    $product->name = $name;
    $product->price = $price;
    $product->reference = $ref;
    R::store($product);

    $warehouseproduct = R::dispense("warehouseproduct");
    $warehouseproduct->product = $product;
    $warehouseproduct->inventory = 0;
    R::store($warehouseproduct);

    R::commit();

    return $product->id;

} catch (Exception $e) {

    // Something wrong happened
    R::rollback();
    return -1; // or 0, false, null, or whatever your function accepts as return value

}

Code was typed fast and might contain mistakes, but it should give you an idea.

Jemt commented 1 year ago

Thanks, @Lynesth :-)

I wonder what isolation level is used with begin(), and whether it's sufficient.

In any case I believe the use of transactions allows for dirty reads which table locking prevents:

  1. Transaction A starts and reads a specific row from the table.
  2. Before transaction A completes, transaction B starts and reads the same row from the table.
  3. Transaction A then performs a write operation and updates the values in that row.
  4. Transaction B, which has already read the row, now has an outdated and invalid version of the data since it was changed by transaction A after the read.

It also introduces the risk of lost updates:

  1. Transaction A starts and reads a specific value, let's say it reads a balance of $100.
  2. Before transaction A completes, transaction B starts and reads the same value, which is $100.
  3. Transaction A performs a calculation and intends to update the balance to $120.
  4. Before transaction A commits, transaction B also performs a calculation and intends to update the balance to $150.
  5. Transaction A commits, updating the balance to $120.
  6. Transaction B commits, updating the balance to $150.

While the use of the Serializable isolation level might prevent Transaction B from overwriting the changes made by Transaction A, it will only do so if wrapped in a transaction. Poorly written code can perform both dirty reads and overwrite changes made by transactions, hence introduce errors in our data.

A table lock on the other hand prevents any changes while the locks are held, so not even poorly written code can read or write while the locks are held (depending on the types of locks applied, of course).

I think there are use cases where table locks makes sense.

Table locks also has the benefit of not relying on retries when a transaction fails. The code just wait its turn.

Whether it makes sense to support this in RedBeanPHP, I don't know. I want it, but the community might not. In my particular case I find it easier to use and understand. I don't need to worry or even understand isolation levels or implement support for retrying transactions. Table locks are easy to understand. Just take care not to create dead locks, and release locks as soon as possible to ensure great performance :-)

Lynesth commented 1 year ago

Hey @Jemt !

You do not need to use another isolation level than the default one or use tables locking in order to still safely do the product/warehouse example you give us above.

Using SELECT ... FOR UPDATE (in my previous comment: $product = R::findForUpdate("product", "reference = ?", [ $ref ]);) prevents another session from updating that row (or reading it with the intent to update it) until the lock on it is released. Even if that row doesn't exist yet.

So using the example I wrote here's what would happen:

  1. Transaction A starts and reads a specific row from the table using FOR UPDATE, obtaining a lock on it.
  2. Before Transaction A completes, Transaction B starts and tries to read the same row from the table (using FOR UPDATE as well), but there's a lock on it, so it waits.
  3. Transaction A then performs a write operation and updates the values in that row, releasing the lock on the row.
  4. Transaction B now gets to read the (updated) row and does whatever it needs to do (or not do) with it.

A table lock on the other hand prevents any changes while the locks are held, so not even poorly written code can read or write while the locks are held (depending on the types of locks applied, of course).

It still doesn't prevent poorly written code from just waiting until the locks are released to do poorly written things to your data.

Table locks also has the benefit of not relying on retries when a transaction fails.

I'm not sure I understand this statement, could you elaborate what you mean?

I think there are use cases where table locks makes sense.

There might be some but I've yet to come across one. Even searching about it on internet you'll mainly find answers stating it's regarded as bad practice.

Mysql documentation states, talking about InnoDB (https://dev.mysql.com/doc/refman/8.0/en/table-locking.html):

For this storage engine, avoid using the LOCK TABLES statement, because it does not offer any extra protection, but instead reduces concurrency. The automatic row-level locking makes these tables suitable for your busiest databases with your most important data, while also simplifying application logic since you do not need to lock and unlock tables.

And gives two reasons as to why you could use LOCK TABLES (https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html):

  • If you are going to run many operations on a set of MyISAM tables, it is much faster to lock the tables you are going to use.
  • If you are using tables for a nontransactional storage engine (...)
Jemt commented 1 year ago

Thanks again, @Lynesth

Even if that row doesn't exist yet.

Interesting! I was not aware of this, although it seems that it will actually allow another "session 2" to insert the record while "session 1" processes using "SELECT ... FOR UPDATE", if "session 2" does not use FOR UPDATE (advisory locking?). If another session inserts the row while session 1 is running, then session 1 will fail when it "realize" that a duplicate will be created within its transaction. This requires a UNIQUE constraint or that the selection was based on a primary key, as far as I can gather.

On top of the reasons you mention for still using LOCK TABLES, I would still add that you don't have to design your application to handle concurrency errors as you can avoid them entirely.

Lynesth commented 1 year ago

If you don't use FOR UPDATE with "session 2", it will allow you to read it, but it still won't allow you to update it until after the lock was released.

Jemt commented 1 year ago

According to https://stackoverflow.com/questions/3601895/does-select-for-update-prevent-other-connections-inserting-when-the-row-is-not your suggested approach might introduce a dead lock:

On the default isolation level, SELECT ... FOR UPDATE on a non-existent record does not block other transactions. So, if two transactions both do a SELECT ... FOR UPDATE on the same non-existent index record, they'll both get the lock, and neither transaction will be able to update the record. In fact, if they try, a deadlock will be detected.

Lynesth commented 1 year ago

Check the comment to that answer, or try it yourself, it works properly.

Edit: I will at least add that I tested it before posting, and that it works properly in my case, with my setup. I don't know what issues people might have with this, and it might depends on your dbms version, but feel free to share if you happen to try this and encounter any error.

Jemt commented 1 year ago

@Lynesth I missed that. Thanks for pointing that out.

Jemt commented 1 year ago

@Lynesth Thank you so much for taking the time to explain this in such great detail. I really appreciate it :-)

gabordemooij commented 1 year ago

Ok, so this feature is no longer desired? I did some research, it only seems applicable for MyISAM engines and other storage engines that do not support row locking. These engines are not supported by default by RedBeanPHP.

Jemt commented 1 year ago

I think it is safe to close this issue for now. I still think I will use it sometimes to force serial execution. I might be old fashioned, but it's just simple and easy to understand. Thanks, both of you, for taking the time to do some research and provide alternative solutions :-)