colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
217 stars 103 forks source link

SST Streaming slow #13

Closed davidhiendl closed 7 years ago

davidhiendl commented 7 years ago

When a node joins the cluster it is replicated REALLY slow. The nodes communicate over an encrypted docker swarm overlay network.

It was transferred with only 368kb/s on average on a 10 GbE dedicated NIC for docker swarm nodes. Nothing else was running on those nodes during tests. There was no workload what so ever running.

Obviously this is not as big of a problem with a fresh / nearly empty database but I have databases that exceed 50GB.

WSREP_SST: [INFO] Waiting for SST streaming to complete! (20170413 17:05:45.209)
2017-04-13 17:05:47 140266367350528 [Note] WSREP: (6de50e53, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-04-13 17:09:47 140266358957824 [Note] WSREP: 0.0 (799c2abb1c07): State transfer to 1.0 (a99c41c77645) complete.
2017-04-13 17:09:47 140266358957824 [Note] WSREP: Member 0.0 (799c2abb1c07) synced with group.
WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20170413 17:09:47.091)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check  --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20170413 17:09:47.097)
   joiner: => Rate:[ 368KiB/s] Avg:[ 368KiB/s] Elapsed:0:04:01  Bytes: 87.1Mi
colinmollenhour commented 7 years ago

Yeah that's pretty slow, but sounds like a swarm issue to me, not an SST issue. Are you using --opt encrypted? If on a local network I think you want to not use encryption as the overhead is quite significant.

colinmollenhour commented 7 years ago

I'd run a test between two nodes using something like "time socat ..." to establish a baseline. If that doesn't expose an issue then you could try disabling "progress" but then you won't get the Rate output so you'll have to measure it yourself somehow.

On my cluster running on Kontena with a "trusted-subnet" configured (disables encryption between nodes) I got speeds much better than you're getting on just a gigabit network. I enabled compression with pigz but still my network transfer rate was also much higher. (~50GB in a few minutes)

davidhiendl commented 7 years ago

Using basic nc data transfer over encrypted overlay network I achieved 250mb/s transfer speed... Granted it's no where near that without encryption, but it's still several orders of magnitude faster then the sst replication stream...

davidhiendl commented 7 years ago

Found the problem, CPU limit was limiting the transfer rate. Now its an acceptable ~50mb/s. Sorry about that. Closing.