codership / galera

Synchronous multi-master replication library
GNU General Public License v2.0
447 stars 177 forks source link

Galera / MariaDB issue with ALTER TABLE statement still propagating to other cluster members in TOI while it failed on origin server #635

Closed bravoman closed 1 year ago

bravoman commented 1 year ago

Hi, I'm new here so I'm not really sure what are the minimal requirements for a bug report so I'll just work from my own experience for now:

We had an incident this morning whereby an ALTER TABLE statement was executed on a server in a cluster of 3 servers. This became an outage for us because the statement resulted in a COPY algorithm being used on a large table (in member 2 and 3 of the cluster) which (at least in my opinion) should have been prevented by the statements that were done on member 1 of the cluster:

SET SESSION alter_algorithm="INSTANT";
alter table some_large_table_containing_blobs
    modify type enum ('existing_option1','existing_option3','existing_option2') null;

Note that the enum order changed by accident which caused the COPY algorithm to be needed which made it fail on member 1 which in turn didn't stop it from propagating the query to the other cluster members somehow (this seems like unwanted behavior to me). I've checked this by using SHOW PROCESSLIST; on all machines and found that it was only running on both members 2 and 3 while it clearly showed me the following error on member 1 to which I was connected while running the queries: ALGORITHM=INSTANT is not supported. Reason: Cannot change column type. Try ALGORITHM=COPY

Some version information:

[root@db1 ~]# yum list installed | grep galera
galera.x86_64                      25.3.37-1.el7.centos            @mariadb-main
[root@db1 ~]# yum list installed | grep MariaDB
MariaDB-backup.x86_64              10.3.36-1.el7.centos            @mariadb-main
MariaDB-client.x86_64              10.3.36-1.el7.centos            @mariadb-main
MariaDB-common.x86_64              10.3.36-1.el7.centos            @mariadb-main
MariaDB-compat.x86_64              10.3.36-1.el7.centos            @mariadb-main
MariaDB-server.x86_64              10.3.36-1.el7.centos            @mariadb-main

I was wondering if this might be a known issue somehow? Or if this is really a bug? And if so is it a bug in Galera or in MariaDB?

bravoman commented 1 year ago

See https://jira.mariadb.org/browse/MDEV-30456 for the report with MariaDB.

sjaakola commented 1 year ago

This issue happens because the user chosen alter algorithm is not enforced by receiving node's replication applier. The fix will be submitted to mariadb side, this repository is purely about the replication itself, so closing the bug here