Open doublez13 opened 3 years ago
This is quite strange as I just built this exact setup before finding this git. The change I have resolves this comment so thought it could be of use:
__/usr/local/bin/check_traefik__
#!/bin/sh
# Determine if the traefik container is running on this host
RESPONSE=$(docker ps --filter "name=traefik" --filter "status=running" --format "{{.ID}} {{.Names}}")
if [ -z "$RESPONSE" ]; then
exit 1
fi
And in my vrrp keepalived.conf I use:
global_defs {
# Keepalived process identifier
router_id traefik
enable_script_security
}
vrrp_script check_traefik {
script "/usr/local/bin/check_traefik"
interval 2
fall 2
rise 2
init_fail
user root
}
# Virtual interface
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
state MASTER
interface enp1s0
virtual_router_id 51
priority 100
# The virtual ip address shared between the two loadbalancers
virtual_ipaddress {
10.1.2.20
}
track_script {
check_traefik
}
}
And similarly for the slaves. This way vrrp address follows traefik.
Yeah I currently have the VRRP address set to prefer the node that is running traefik.
vrrp_script chk_traefik {
#script "pgrep traefik" #Had to use this on debian distros
script "pidof traefik"
interval 30
weight 10
}
Have you ever timed the migration times when draining a node? Usually the containers are pretty quick to migrate, but sometimes they take up to 60 seconds, which kind of sucks.
I noticed that just after I posted my response :)
I haven't really started using it in anger yet and haven't really noticed the delay. I've done some failure testing and not suffered anything as long as that.
Where I can I have used the order:
stanza to ensure the containers stay up until the new one has built and is running though.
deploy:
labels:
traefik.enable: "True"
...
replicas: 1
update_config:
delay: 15s
order: start-first
parallelism: 1
The only place I don't use this is on some containers where the volumes are fussy about db and file locking, eg. portainer.
I'll give that a try for sure.
What volume driver do you use?
Currently using nfs as i'm only usingit for configs.
We're about to start testing with cephs (rexray) backed with md and iscsi to see how that goes.
vrrp node -> Traefik node -> Wordpress node -> DB node
On a 4 node swarm, its possible for these to be different nodes. One node going down disrupts the cluster.
Ideally, vrrp and Traefik would share one node, each WP DB stack would live on the same node. This way, its less likely to disrupt a site if a node goes down