Open ayurchen opened 9 years ago
We also have:
CREATE TABLE LIKE CREATE AS SELECT Merge storage engine
Well, the proposal implies any CREATE
statement, or, more generally, creation of any resource in TOI mode makes appending a key referencing this resource unnecessary.
No, I meant that CREATE TABLE LIKE and CREATE AS SELECT may reference tables other than the one that is being created. Maybe those table names should be used in certification?
Indeed. My proposal is solely about the resource to be created and therefore non-existent. And the keys for source tables should probably be of shared type.
Hi guys, just to note here that I'm working on this issue (although this may take me some time).
So I don't understand the details of this issue yet (looking into this), but just wondering: Is there any issue with the proposed change in a case where different nodes get simultaneous conflicting CREATE TABLE requests (so with the same table name but conflicting parameters)?
CREATE TABLE is a DDL. It is executed in a so-called total-order-isolation (TOI) mode - it waits for the all preceding actions to complete and all subsequent actions on that table will wait (or fail certification) for this DDL to complete, That means that those DDLs are strictly ordered with respect to each other, and one will be executed before another on all nodes. So the second conflicting CREATE will simply fail because the table already exists. ATM we allow such failures as they must be deterministic and leave the database in consistent state.
Ok, so I made a pull request for one possible approach to this: https://github.com/codership/mysql-wsrep/pull/142
This specifically targets the following cases, and ignores some other cases mentioned here:
Basic CREATE TABLE statements
CREATE TABLE LIKE, where the source table is a temporary table
CREATE TABLE LIKE, where the source table is not a temporary table
Note that this does not do anything about the alter table case (because there appear to be other locking issues in that case such that avoiding certification failure doesn't actually help).
And this does not address the CREATE TABLE AS SELECT case because this does not appear to be patched correctly for replication (I can't see any call to WSREP_TO_ISOLATION_BEGIN in this case, and when I try this on a test cluster I get an assertion in the provider without any call to wsrep->to_execute_start).
Also, this does not do anything about merge storage engine (as mentioned in a comment above, let me know if it should!)
It would be cleaner, I think, to make the provider support TOI replication without any keys, but this gets more complicated, and is a change to the provider API (so, e.g. if other providers also do not support replication without keys, depending on this in mysql-wsrep would break those other providers).
Another potential approach for the actual specific client use case referenced here (although not the same as the proposal in the issue), could perhaps be to target the IF NOT EXISTS case more specifically, and to do some kind of early out in this case (i.e. without replication)? (I note that the CREATE TABLE AS SELECT code path does check for this and bail out quite early on, but in the case of standard CREATE TABLE this is further down the call stack, in create_table_impl.)
So it seems like this is trickier than expected, and the proposal about not appending table key doesn't work, at least not with the current wsrep provider API (see the comments on my pull request).
One question that might be relevant, about specific requirements for this issue: Would it be enough to avoid deadlocks only in the quite specific client use case where the CREATE TABLE statement is actually CREATE TABLE ... IF NOT EXISTS, and even more specifically, with the new table spec actually most probably identical to the already created table?
Would it be enough to avoid deadlocks only in the quite specific client use case where the CREATE TABLE statement is actually CREATE TABLE ... IF NOT EXISTS, and even more specifically, with the new table spec actually most probably identical to the already created table?
This should be done before TOI critical section and it would be prone to race conditions, so it would not be fail proof solution either.
The correct solution to this problem is to allow shared key level also for TOI queries. However, this requires adding flags field to wsrep::to_execute_start()
call and modifications to key structure and certification algorithm on Galera side to take hierarchical structure of keys into account. This will be quite significant change and won't probably be seen in 3.x (unless backported sometime in the future).
This should be done before TOI critical section and it would be prone to race conditions, so it would not be fail proof solution either.
Right. To do some kind of early out without race conditions, in the general case, it seems like we would need some kind of TOI 'if not exists check'.
But I was wondering then whether it would be possible to target the very specific case where the creation statement table spec exactly matches the existing table, on the basis that it may then not matter what order two such create table statements are applied.
The correct solution to this problem is to allow shared key level also for TOI queries.
Ok, fair enough!
But I was wondering then whether it would be possible to target the very specific case where the creation statement table spec exactly matches the existing table, on the basis that it may then not matter what order two such create table statements are applied.
I guess it would be possible to do 'out-of-order' check for 'CREATE TABLE IF NOT EXISTS' before TOI critical section and return success if the table already exists. This should already reduce unwanted certification conflicts significantly.
Percona has reported the following use case:
CREATE TABLE ... IF NOT EXISTS
This DDL in turn causes the ongoing transactions from other clients that are already using this table to fail certification, even though this statement has no effect.
Proposal:
ALTER statements which are not modifying existing columns (or their order) or unique indexes should not append table key for the same reasons
this is valid only if DDL statements are executed in TOI.