colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
217 stars 102 forks source link

Can't start seed #40

Closed jaschaio closed 6 years ago

jaschaio commented 6 years ago

Error message in seed service: start.sh: Trapped error on line 113

Stack File:

version: '3.5'

services:

    seed:
        image: colinmollenhour/mariadb-galera-swarm:10.1
        networks:
            - database
        environment:
            - XTRABACKUP_PASSWORD_FILE=/run/secrets/xtrabackup_password
            - MYSQL_DATABASE=docker
            - MYSQL_USER=docker
            - MYSQL_PASSWORD_FILE=/run/secrets/mysql_password
            - MYSQL_ROOT_PASSWORD_FILE=/run/secrets/mysql_root_password
            - NODE_ADDRESS=^10.0.0.*
        secrets:
            - xtrabackup_password
            - mysql_password
            - mysql_root_password
        command: seed
        volumes:
            - database:/var/lib/mysql
        deploy:
            placement:
                constraints:
                    - node.labels.type == database

    node:
        image: colinmollenhour/mariadb-galera-swarm:10.1
        networks:
            - database
        environment:
            - XTRABACKUP_PASSWORD_FILE=/run/secrets/xtrabackup_password
            - NODE_ADDRESS=^10.0.0.*
            - HEALTHY_WHILE_BOOTING=1
        command: node seed,node
        secrets:
            - xtrabackup_password
        volumes:
            - database:/var/lib/mysql
        deploy:
            replicas: 0
            placement:
                constraints:
                    - node.labels.type == database

volumes:
    database:
        driver: rexray/dobs
        driver_opts:
            size: 10

networks:
    database:
        external: true

secrets:
    xtrabackup_password:
        external: true
    mysql_password:
        external: true
    mysql_root_password:
        external: true

The network is defined as "external" as I have created it manually: docker network create -d overlay database

But even if I follow the exact same stack file https://github.com/colinmollenhour/mariadb-galera-swarm/blob/master/examples/swarm/docker-compose.yml I get the error

jaschaio commented 6 years ago

This is btw. the same error reported as in #38 but the proposed solution of that issue is not relevant or wasn't the cause of it in the first place.

To debug this I created a "utility" service based on alpine and copied the relevant part from the start.sh script which throws the error and should usually export the node address. This is the stack file I used for the utility service:

version: '3.5'

services:

    utility:
        image: alpine
        networks:
            - database
        command: sleep 10000s
        deploy:
            placement:
                constraints:
                    - node.hostname==swarm-manager-01

networks:
    database:
        external: true

This is the part of the start.sh script I copied and executed:

if [ -z "$NODE_ADDRESS" ]; then
    # Support Weave/Kontena
    NODE_ADDRESS=$(ip addr | awk '/inet/ && /ethwe/{sub(/\/.*$/,"",$2); print $2}')
fi
if [ -z "$NODE_ADDRESS" ]; then
    # Support Docker Swarm Mode
    NODE_ADDRESS=$(ip addr | awk '/inet/ && /eth0/{sub(/\/.*$/,"",$2); print $2}' | head -n 1)
elif [[ "$NODE_ADDRESS" =~ [a-zA-Z][a-zA-Z0-9:]+ ]]; then
    # Support interface - e.g. Docker Swarm Mode uses eth0
    NODE_ADDRESS=$(ip addr | awk "/inet/ && / $NODE_ADDRESS\$/{sub(/\\/.*$/,\"\",\$2); print \$2}" | head -n 1)
elif ! [[ "$NODE_ADDRESS" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
    # Support grep pattern. E.g. ^10.0.1.*
    NODE_ADDRESS=$(getent hosts $(hostname) | awk '{print $1}' | grep -e "$NODE_ADDRESS")
fi
if ! [[ "$NODE_ADDRESS" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
    echo "Could not determine NODE_ADDRESS: $NODE_ADDRESS"
    exit 1
fi
echo "...------======------... MariaDB Galera Start Script ...------======------..."
echo "Got NODE_ADDRESS=$NODE_ADDRESS"

If I execute it with bash it doesn't throw an error and outputs:

...------======------... MariaDB Galera Start Script ...------======------...
Got NODE_ADDRESS=10.0.3.10

If I just enter getent hosts $(hostname) I get the Following Output: 10.0.3.10 57266d1f7253 57266d1f7253

But if I set the NODE_ADDRESS= environment variable to the value in the readme ^10.0.0.* I get the Following error instead: Could not determine NODE_ADDRESS:

Setting it to ^10.0.*.* instead gives me the same output as before. So I adjusted that value within my stack file posted above and now it seems to work.

Not sure if is specific to my setup that internal node IPs follow a different pattern than the one you use.

colinmollenhour commented 6 years ago

Thanks for the report. It sounds like the swarm example README just gives instructions that are not specific enough or too fragile.. I don't have a swarm cluster setup to test with, but is there a simple way you know of to determine what the docker network subnet is? I think if the README was updated it would prevent this confusion.

colinmollenhour commented 6 years ago

I've updated the stack file and README so that it will work out of the box in more cases and hopefully better explain how the NODE_ADDRESS is to be used.