colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
217 stars 102 forks source link

Node shutdowns after removing seed. #51

Closed ghatdev closed 5 years ago

ghatdev commented 5 years ago

I deployed stack exactly as README. But after scale down seed service to 0 and scale up nodes +1, all nodes goes down. My stack file is: (Just added ingress port and edited version to 3.5) version: '3.5'

services: seed: image: colinmollenhour/mariadb-galera-swarm environment:

volumes: mysql-data: name: '{{.Service.Name}}-{{.Task.Slot}}-data' driver: local

networks: galera_network: name: 'cluster-test-db-net' driver: overlay

secrets: xtrabackup_password: file: .secrets/xtrabackup_password mysql_password: file: .secrets/mysql_password mysql_root_password: file: .secrets/mysql_root_password

colinmollenhour commented 5 years ago

In the example readme you are directed to scale up nodes before scaling down the seed.

petrus-v commented 5 years ago

@ghatdev, Which version of docker daemon / docker swarm are you using ?

I get quite the same behavior using docker 18.06.1-ce on coreos. I had to add endpoint mode to dnsrr (DNS round-robin) instead vip (Virtual IP: the current default) :

 deploy:
       replicas: 0
+      endpoint_mode: dnsrr

I bellives that virtualip make galera failling down. I'm pretty new with galera I would be glad to hear from experts to confirm that we should not use VIP load balancing strategy with swarm for nodes and prefer the round robin strategy which looks like to be supported?

As DNS (over the service name) are resolved at the image startup you can thing that at some point the wsrep_cluster_address varaible is wrongly setup but I've noticed there is a STATUS that maintain neighbors nodes wsrep_incoming_addresses which looks like to get update over container restarts.

regards, Pierre

awgneo commented 5 years ago

I am experiencing this issue as well with docker 18.06.1-ce when following the proper procedure (start with seed, add nodes, remove seed).

colinmollenhour commented 5 years ago

I just added the dnsrr option in da9784dd30f5edab41e8ce3432a647824fd10f22 which makes sense. For swarm in particular there is probably a better way to discover the IPs than requiring dnsrr, such as listing the nodes individually in the command.