Open dejanstamenov opened 5 years ago
Any update on this?
Hi @iwalucas and @dejanstamenov
I'm not seeing this issue, here is what I did on my AWS
VMs
~$ docker swarm init
and one worker without --advertise-addr
~$ docker swarm join --token SWMTKN x.x.x.x:2377
This node joined a swarm as a worker.
docker stack
~$ cat compose.yaml
version: "3.7"
services:
swarm-app:
environment:
- VERSION=v4
- METADATA=swarm-dev
image: ehazlett/docker-demo
deploy:
mode: replicated
replicas: 2
endpoint_mode: vip
ports:
- "5050:8080"
and
`docker stack deploy -c compose.yaml demo`
~$ docker service ls ID NAME MODE REPLICAS IMAGE PORTS tkxhdqkimy8i demo_swarm-app replicated 2/2 ehazlett/docker-demo:latest *:5050->8080/tcp
Each container is running on a different node and can be verified via
docker service ps demo_swarm-app
3. Tried to `curl` multiple times and I am able to hit every backend container
for i in range{1..5}; do curl http://0.0.0.0:5050/ping;done {"instance":"7e7002e2f20c","version":"v4","metadata":"swarm-dev"} {"instance":"c0bbff2b1b1e","version":"v4","metadata":"swarm-dev"} {"instance":"7e7002e2f20c","version":"v4","metadata":"swarm-dev"} {"instance":"c0bbff2b1b1e","version":"v4","metadata":"swarm-dev"} {"instance":"7e7002e2f20c","version":"v4","metadata":"swarm-dev"}
A good place to start would be to check the entries in the load-balancer on the host (part of the `ingress` network) where you are sending your requests to via
sudo nsenter --net=/var/run/docker/netns/ingress_sbox ipvsadm -l -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 259 rr
-> 10.255.0.5:0 Masq 1 0 0
-> 10.255.0.6:0 Masq 1 0 0
and each IP should point to the load-balancer endpoint on each node connected to the custom overlay network
I had the same experience as @dejanstamenov on DigitalOcean with 3 Ubuntu 18:04.3 machines running Docker 19.03.4 . One manager and two workers, running a service with 2 replicas. Tasks where deployed to node1 (manager) and node2 (worker). Node3 (worker) wasn't responding until it left the swarm and rejoined using the --advertise-addr flag.
@xperjon that would make sense if Node3 is behind a NAT
Hi @arkodg,
Apologies for the delayed response on this one.
I've just went ahead and created new service without using the --advertise-addr
parameter and have received the below errors.
This is my service:
ID NAME MODE REPLICAS IMAGE PORTS
w90yhr0ef3k3 vlab-stack_vlab-web-service replicated 2/2 dejanstamenov/vlab-docker:experimental *:8080->8080/tcp
and when running curl
just like in your example above, got the below output.
$ for i in range{1..5}; do curl http://0.0.0.0:8080/ping;done
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
The firewall has the port enabled.
$ ufw status
Status: active
To Action From
-- ------ ----
22/tcp ALLOW Anywhere
4445/tcp ALLOW Anywhere
80/tcp ALLOW Anywhere
8080/tcp ALLOW Anywhere
2377/tcp ALLOW Anywhere
7946/tcp ALLOW Anywhere
7946/udp ALLOW Anywhere
4789/udp ALLOW Anywhere
500/udp ALLOW Anywhere
4500/udp ALLOW Anywhere
22/tcp (v6) ALLOW Anywhere (v6)
4445/tcp (v6) ALLOW Anywhere (v6)
80/tcp (v6) ALLOW Anywhere (v6)
8080/tcp (v6) ALLOW Anywhere (v6)
2377/tcp (v6) ALLOW Anywhere (v6)
7946/tcp (v6) ALLOW Anywhere (v6)
7946/udp (v6) ALLOW Anywhere (v6)
4789/udp (v6) ALLOW Anywhere (v6)
500/udp (v6) ALLOW Anywhere (v6)
4500/udp (v6) ALLOW Anywhere (v6)
Have also tried running the nsenter
command and got the below output.
$ nsenter --net=/var/run/docker/netns/ingress_sbox ipvsadm -l -n
nsenter: failed to execute ipvsadm: No such file or directory
Not really sure why I am getting this error on the ipvsadm
, just looking into the details.
Thanks!
@dejanstamenov are you executing that command on the master node ?
@arkodg - Yes, although failed to mention that. All the snippets from yesterday are from the master
node.
@dejanstamenov from the first comment it looks like you might have been behind a NAT so the primary iface IP was not reachable, which is why you needed to manually specify the --advertise-addr
and from the last comment it looks like your master node does not have the ingress sbox (dockerd logs would help for this case and repro steps as well)
Expected behavior
Using Docker Swarm on multiple virtual machines part of the same swarm, located on geographical distinct areas, using different public IP addresses should work fine even when the worker nodes are joining the swarm without specifying their public IP address as part of the
--advertise-addr value
parameter of thedocker swarm join
command (documentation here).Actual behavior
When the swarm is initiated from the swarm manager, given the example output command below:
docker swarm join --token SWMTKN-1-TOKEN-ID x.x.x.x:2377
which is then being used from the other virtual machines that I want to be part of the same swarm, it comes up to light that the inbuilt load balancer will not route requests to the Docker containers which are deployed on the worker nodes. All the load will be against the manager node, unless the
--advertise-addr value
parameter is used in addition to the command above, as per the example below:docker swarm join --token SWMTKN-1-TOKEN-ID x.x.x.x:2377 --advertise-addr y.y.y.y
where
y.y.y.y
is the IPv4 address of the worker node that is joining the swarm manager, identified by thex.x.x.x
IPv4 address.Steps to reproduce the behavior
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.) The virtual machines that I am currently using are spread around the world on OpenNebula and Microsoft Azure cloud providers. I've got machines on all the continents which should be part of the same Docker swarm. All of the virtual machines are running under Ubuntu 18.04 LTS (Bionic). These machines are running on a single network interface (eth0) using single IPv4 public IP address per machine.
As per the
docker swarm join
command documentation here, it does not state that the--advertise-addr value
parameter is mandatory when using single network interface with single IPv4 address. As a matter of fact, when I run thedocker swarm join
command without this parameter, the virtual machines will join the swarm with no errors displayed and the containers will also get deployed successfully. But then, when I will try and simulate load of my containers through Apache jMeter, where the destination IP address of my test cases is the public IP address of the swarm manager node, all the requests will only hit the containers that are running on the manager node. I've been wondering why this would be the case and then I tried to join the swarm node but having each worker node advertise it's own public IP address - the swarm load balancing works like a charm and the load is distributed among all the docker containers with are part of the swarm (not only the ones running under the swarm manager).I find this really confusing, because I would expect that I would need to advertise my address in cases when I am using multiple network interfaces, having multiple public IP addresses (for example, IPv4 and IPv6). In such scenarios, would have expected that the Docker Swarm is confused on which IP address to be used.
Below I will be sending the complete Docker Compose file I have been using:
Hope that I've not missed any important information. If something is confusing or need further clarification, please do let me know, happy to provide any feedback.
Thank you.