doublez13 / docker-swarm-example-setup

Guide for setting up a basic Docker Swarm cluster.
Apache License 2.0
25 stars 5 forks source link

Find a smarter way to distribute the containers. #6

Open doublez13 opened 3 years ago

doublez13 commented 3 years ago

vrrp node -> Traefik node -> Wordpress node -> DB node

On a 4 node swarm, its possible for these to be different nodes. One node going down disrupts the cluster.

Ideally, vrrp and Traefik would share one node, each WP DB stack would live on the same node. This way, its less likely to disrupt a site if a node goes down

paulb-smartit commented 3 years ago

This is quite strange as I just built this exact setup before finding this git. The change I have resolves this comment so thought it could be of use:

__/usr/local/bin/check_traefik__

#!/bin/sh

# Determine if the traefik container is running on this host

RESPONSE=$(docker ps --filter "name=traefik" --filter "status=running" --format "{{.ID}} {{.Names}}")

if [ -z "$RESPONSE" ]; then
    exit 1
fi

And in my vrrp keepalived.conf I use:

global_defs {
  # Keepalived process identifier
  router_id traefik
  enable_script_security
}
vrrp_script check_traefik {
  script "/usr/local/bin/check_traefik"
  interval 2
  fall 2
  rise 2
  init_fail
  user root
}
# Virtual interface
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER
  interface enp1s0
  virtual_router_id 51
  priority 100
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    10.1.2.20
  }
  track_script {
    check_traefik
  }
}

And similarly for the slaves. This way vrrp address follows traefik.

doublez13 commented 3 years ago

Yeah I currently have the VRRP address set to prefer the node that is running traefik.

vrrp_script chk_traefik {
   #script "pgrep traefik" #Had to use this on debian distros
    script "pidof traefik"
    interval 30
    weight 10
}

Have you ever timed the migration times when draining a node? Usually the containers are pretty quick to migrate, but sometimes they take up to 60 seconds, which kind of sucks.

paulb-smartit commented 3 years ago

I noticed that just after I posted my response :)

I haven't really started using it in anger yet and haven't really noticed the delay. I've done some failure testing and not suffered anything as long as that.

Where I can I have used the order: stanza to ensure the containers stay up until the new one has built and is running though.

    deploy:
      labels:
        traefik.enable: "True"
        ...
      replicas: 1
      update_config:
        delay: 15s
        order: start-first
        parallelism: 1

The only place I don't use this is on some containers where the volumes are fussy about db and file locking, eg. portainer.

doublez13 commented 3 years ago

I'll give that a try for sure.

What volume driver do you use?

paulb-smartit commented 3 years ago

Currently using nfs as i'm only usingit for configs.

We're about to start testing with cephs (rexray) backed with md and iscsi to see how that goes.