colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
219 stars 103 forks source link

XtraBackup method requires updated command to function #71

Closed BenResTech closed 5 years ago

BenResTech commented 5 years ago

I was seeing clustering problems recently when testing a move from 10.2.15 to 10.2.23. Initial errors seen from the service were:

WSREP_SST: [INFO] Sleeping before data transfer for SST (20190503 09:36:43.758)
2019-05-03  9:36:44 139751752288000 [Note] WSREP: (10152898, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 10.20.22.19 4444 (20190503 09:36:53.765)
WSREP_SST: [INFO] Evaluating innobackupex    --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | pv -f  -i 10 -N donor -F '%N => Rate:%r Avg:%a Elapsed:%t %e Bytes: %b %p'  -s 18993152 2>>/tmp/mysql-console/fifo | socat -u stdio TCP:10.20.22.19:4444; RC=( ${PIPESTATUS[@]} ) (20190503 09:36:53.769)
2019-05-03  9:36:53 139749446579968 [Warning] Aborted connection 23 to db: 'unconnected' user: 'xtrabackup' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20190503 09:36:53.803)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20190503 09:36:53.807)
WSREP_SST: [INFO] Cleaning up fifo file /tmp/mysql-console/fifo (20190503 09:36:53.816)
rm: cannot remove '/tmp/mysql-console/fifo': Permission denied
WSREP_SST: [INFO] Cleaning up temporary directories (20190503 09:36:53.824)
2019-05-03  9:36:53 139747011843840 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.20.22.19:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/'     '' --gtid '1015ad9c-6d86-11e9-ad00-cba3c1db7287:0' --gtid-domain-id '0'
2019-05-03  9:36:53 139747011843840 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.20.22.19:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/'     '' --gtid '1015ad9c-6d86-11e9-ad00-cba3c1db7287:0' --gtid-domain-id '0': 22 (Invalid argument)
2019-05-03  9:36:53 139747011843840 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.20.22.19:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/'     '' --gtid '1015ad9c-6d86-11e9-ad00-cba3c1db7287:0' --gtid-domain-id '0'
2019-05-03  9:36:53 139751743895296 [Warning] WSREP: 0.0 (41e183d1db2d): State transfer to 1.0 (0dfe460e7bb6) failed: -22 (Invalid argument)

When I dug into it, the innobackup log in /var/lib/mysql stated: InnoDB: Unsupported redo log format. The redo log was created with MariaDB 10.2.23.

After some Googling, I found this page where MariaDB states: "Percona XtraBackup does not work with MariaDB 10.1 or greater if encryption or compression is used, or when innodb_page_size is set to some value other than 16K. It also does not work with MariaDB 10.2 or greater if innodb_safe_truncate=ON is set. It also does not work with MariaDB 10.3 or greater. For the cases where Percona XtraBackup is not supported, see Mariabackup instead."

In my case (10.2.23), the problem was the 'new' innodb_safe_truncate option that defaults to being ON (our previous version was 10.2.15, prior to the introduction of this option - it was introduced in 10.2.19).

Supplying the option --innodb_safe_truncate=OFF as part of my command string to Galera within the compose file made everything start clustering and working again.

I thought I'd raise this here so the command can be updated (or the move to MariaBackup can be examined), and so that anyone else seeing this doesn't have to spend as long searching for the answer :)

colinmollenhour commented 5 years ago

Thanks for the report, Ben! So do you think simply switching ENV SST_METHOD=xtrabackup-v2 to ENV SST_METHOD=mariabackup in the 10.2 Dockerfile is all that is needed to fix?

BenResTech commented 5 years ago

Having just tested it here, changing that variable to mariabackup does correctly swap to using mariabackup for the state synchronisation.

I can then assemble a cluster of nodes without requiring the innodb_safe_truncate option - the galera nodes cluster correctly as we were seeing with the previous 10.2.15 version.

As Maria seem to be moving towards using their own backup solution instead of Xtrabackup anyway then this move would have to be made at some point in the future anyway, so I guess now would be as good a time as any :)

colinmollenhour commented 5 years ago

Thanks again, Ben! I updated the Dockerfile and am working on pushing new builds.