colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
217 stars 102 forks source link

Suggested improvement for docker-swarm.yml #43

Closed alphaDev23 closed 5 years ago

alphaDev23 commented 6 years ago

To simplify the manual operations, per the readme, of the galera cluster in a swarm:

  1. Differentiate the seed service volume from the node service volume, e.g, for seed, change the volume name to mysql-data-seed and add a corresponding volume definition to the volumes configuration section. Needed to prevent permissions issues when the seed and a node are on the same swarm node.
  2. Add a script to the node service similar to the one at the end of this post to verify that seed is running prior to initializing a node. Note, the '/tmp/initialize' file needs to be added in the Dockerfile, i.e. 'RUN touch /tmp/initialize' for the script to work as expected. Finally, add the following to the node service definition so that the node containers do not initially fail due to the healthcheck. healthcheck: test: ["CMD", "/usr/local/bin/healthcheck.sh"] interval: 2m timeout: 10s retries: 5
  3. After the galera cluster initializes, remove (or scale to 0) the seed container.

The above eliminates 2 of the 3 manual steps currently required, i.e., scaling the node count initially and subsequently scaling the node count again after the removal of the seed. This configuration has not been fully tested but appears to initialize without issue on a limited number of stacks. Since the script is dependant upon the '/tmp/initialize' and this file is removed after node initialization, this modification should not affect subsequent recoveries of the cluster.

!/bin/bash

dbAlive=0 while [ $dbAlive -eq 0 ] && [ -f /tmp/initialize ] ; do if mysql -u root -p${MYSQL_ROOT_PASSWORD} -h ${MYSQL_HOST} -e ";" ; then dbAlive=1 fi sleep 10 done rm /tmp/initialize

colinmollenhour commented 6 years ago

Thanks for the suggestions, I'd love to see a pull request implementing them.

On the Kontena example I went a different direction by removing the seed service altogether (it has no purpose after the initial setup) and adding some "hooks" that apply only to the first instance (the seed) and then deploying each node by removing a "hold-start" flag.

alphaDev23 commented 6 years ago

I was thinking about the same (removing the seed service) for swarm but did not have time to dive into the code to understand the details. I'm unfamiliar with Kontena. Do you plan on incorporating the code executed in the Kontena yml file in start.sh? If yes, that would make my suggestion moot and would greatly simplify the swarm implementation.

One thing to consider, and it may make no difference between seed and node containers with the current implementation, is database initialization via existing sql, etc. files in the initdb folder. The clusters that I have tested places these files only in the seed container.

BTW: Excellent implementation of Galera clusters within swarm, etc. It is by far the easiest to implement with minimal configuration/environment variables required. It also appears to be one of the most robust (although I still need to fully understand Galera cluster recovery scenarios).

colinmollenhour commented 6 years ago

I think the finer details about how to deploy the seed and nodes is going to vary widely based on scheduler and also personal preference so while I'm not opposed to adding more functionality, I don't want to specialize it too much for a particular scheduler if it restricts the ability to use it for others.

Kontena has a nice easy way to make a command run once on a particular or all instances of a service so it is easy to get rid of a "seed" service, but Docker Swarm doesn't have this feature. Note, when importing a large db it is a lot faster to import it on a single non-clustered node with binlog disabled and then let the others do SST than to import the data on a live cluster. Going this route it is hard to automate when the nodes should come online, hence the use of the "hold-start" flag to allow the nodes to be launched but to prevent them from joining immediately.

alphaDev23 commented 6 years ago

With respect to the initdb folder, mariadb specifies this as: "When a container is started for the first time, a new database with the specified name will be created and initialized with the provided configuration variables. Furthermore, it will execute files with extensions .sh, .sql and .sql.gz that are found in /docker-entrypoint-initdb.d. Files will be executed in alphabetical order."

That is, currently, my seed container is initialized with existing database(s) - the existing database is included in the image - then the node containers are started after the initial database is online. Is this implementation incorrect? Testing on small databases confirms that the data is correctly transferred to the node containers.

Regarding the scheduler options, would it be possible to include an environment variable, e.g. "REMOVE_SEED", and wrap any necessary code to remove the seed within if statements in start.sh.? This would give the option to bring up the seed container separate from the node containers (the current implementation) or to implement a cluster without manual intervention.

I'm not sure if there are others but the manual intervention was the one thing that initially steered me away from this solution (because it adds a bit of complexity and wherever there is complexity there are unexpected issues). After further understanding, the fact that it is well-maintained, and the fact that it resolves complexities in other solutions, I came back to it. Resolving the manual intervention, or at least the option to, makes this solution almost a no-brainer compared to other solutions. I'm not a Galera cluster expert at this time so some may disagree with my opinions. For my situation, I just need a reliable database cluster solution (to substitute for existing mariadb/mysql databases) that will run in swarm in order to support other stateless applications that also run in the same swarm cluster.

colinmollenhour commented 6 years ago

Including an initdb file in your image seems perfectly valid to me. There is no one way to do it, and that probably is the best way for a small database snapshot.

I'm not sure exactly what you're proposing with REMOVE_SEED but it sounds like what you'd like is for all the nodes to come up automatically with no manual bootstrapping using your initdb file to seed the database. In that case I'd probably recommend using environment variables to declare another variable or file path that could be used to determine if the current instance should bootstrap itself or wait for the seed. For example:

AUTO_SEED_VAR=TASK_SLOT
AUTO_SEED_PATTERN=^1$
# Instance is seed if [[ $TASK_SLOT=~ ^1$ ]]

Then all nodes that do not match the pattern just keep trying to join the cluster until they succeed.

For setting TASK_SLOT see: https://docs.docker.com/engine/reference/commandline/service_create/#create-services-using-templates

alphaDev23 commented 6 years ago

Regarding the 'REMOVE_SEED' suggestion, it was not specific to the initdb files but rather to removing the seed requirement. I had asked above, regarding the Kontena example (which removes the seed service): "Do you plan on incorporating the code executed in the Kontena yml file in start.sh?" You responded: "I don't want to specialize it too much for a particular scheduler if it restricts the ability to use it for others." The REMOVE_SEED variable suggestion is an idea to accomplish the seed removal while keeping the current scheduler flexibility.

Regarding the initdb files, the import is already implemented in start.sh today although the implementation suffers from other issues, one of which is described in https://github.com/colinmollenhour/mariadb-galera-swarm/issues/44.

Another issue exists with the same code which I will add as a new issue after further testing.

alphaDev23 commented 6 years ago

Is there any reason that the seed container can not just considered part of the cluster after initialization? That is, for a 3 node galera cluster, why not just set replicas=2 on the node service with no further changes rather than scaling down the seed service to 0 and then scaling up the node service to 3?

colinmollenhour commented 5 years ago

Is there any reason that the seed container can not just considered part of the cluster after initialization?

Because the seed command is specifically designed to seed a new cluster and you don't want there being any chance of accidentally seeding a new cluster and clobbering your existing cluster. Not saying there isn't a way to do it, but I think it's a lot safer to switch all nodes to nodes or else use the existing /var/lib/mysql/new-cluster flag file on a single node or TASK_SLOT like I suggested so that it is seeded on a single node.