colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
217 stars 102 forks source link

start.sh: Trapped error on line 120 #59

Closed ajarmoszuk closed 5 years ago

ajarmoszuk commented 5 years ago

Hi all, I'm getting the following error on one of the nodes as soon as I want to start two after starting the seed: start.sh: Trapped error on line 120 on the node that doesn't want to start.

Here is my docker-compose.yml file: version: '3.4'

services:
  seed:
    image: colinmollenhour/mariadb-galera-swarm
    environment:
      - XTRABACKUP_PASSWORD_FILE=/run/secrets/xtrabackup_password
      - MYSQL_USER=user
      - MYSQL_PASSWORD_FILE=/run/secrets/mysql_password
      - MYSQL_DATABASE=database
      - MYSQL_ROOT_PASSWORD_FILE=/run/secrets/mysql_root_password
      - NODE_ADDRESS=^10.0.9.*
    networks:
      - database-net
    command: seed
    volumes:
      - mysql-data:/var/lib/mysql
    deploy:
      placement:
        constraints:
          - node.role == manager
    secrets:
      - xtrabackup_password
      - mysql_password
      - mysql_root_password
  node:
    image: colinmollenhour/mariadb-galera-swarm
    environment:
      - XTRABACKUP_PASSWORD_FILE=/run/secrets/xtrabackup_password
      - NODE_ADDRESS=^10.0.9.*
      - HEALTHY_WHILE_BOOTING=1
    networks:
      - database-net
    command: node seed,node
    volumes:
      - mysql-data:/var/lib/mysql
    deploy:
      replicas: 0
      placement:
        constraints:
          - node.role == worker
    secrets:
      - xtrabackup_password

volumes:
  mysql-data:
    name: '{{.Service.Name}}-{{.Task.Slot}}-data'
    driver: local

networks:
  database-net:
    external: true

secrets:
  xtrabackup_password:
    file: .secrets/xtrabackup_password
  mysql_password:
    file: .secrets/mysql_password
  mysql_root_password:
    file: .secrets/mysql_root_password

It's exactly the same as the one in the tutorial however I created an external network via overlay on subnet 10.0.9.0/24, it should work, I even tried copying directly from the tutorial and the issue is the same.

ajarmoszuk commented 5 years ago

I have also now tested the issue further as someone else has have had it on #40, I have now ran the test alpine image with bash and the node that worked originally returns me with:

...------======------... MariaDB Galera Start Script ...------======------...
Got NODE_ADDRESS=10.0.9.105

also the node that doesn't work returns:

...------======------... MariaDB Galera Start Script ...------======------...
Got NODE_ADDRESS=10.0.9.127

So there is network connectivity, but I'm not sure what is the exact issue here...

colinmollenhour commented 5 years ago

Hmm, that line is this:

NODE_ADDRESS=$(getent hosts $(hostname) | awk '{print $1}' | grep -e "$NODE_ADDRESS")

But I don't know which of those commands would be failing...

ajarmoszuk commented 5 years ago

I have found the issue to be related to IPv6.

Turns out because the docker swarm manager and other worker were running a IPv6 interface while the other worker was running IPv4 it would try to contact the server over IPv6 and fail.

Solution to this was to switch over the NODE_ADDRESS to eth0, it would get over the original error and still fail but eventually start up as Docker would tell it to run over IPv4 the second time round.

Obviously this was an issue with my network setup and not with the package. But I wanted to leave this note here would anyone else encounter a similar issue in the future.

It would be a good idea to put a disclaimer to make sure if you run IPv6 you should make sure it runs on all swarm nodes, or if you can't enable IPv6 everywhere, use eth0 instead of the IP address to get past the error.

Thanks.

colinmollenhour commented 5 years ago

Ahh, thanks for the update!