Load balancing under Docker Swarm not working if worker nodes do not advertise their IP address

dejanstamenov commented 5 years ago

[x] This is a bug report
[ ] This is a feature request
[x] I searched existing issues before opening this one

Expected behavior

Using Docker Swarm on multiple virtual machines part of the same swarm, located on geographical distinct areas, using different public IP addresses should work fine even when the worker nodes are joining the swarm without specifying their public IP address as part of the --advertise-addr value parameter of the docker swarm join command (documentation here).

Actual behavior

When the swarm is initiated from the swarm manager, given the example output command below:

docker swarm join --token SWMTKN-1-TOKEN-ID x.x.x.x:2377

which is then being used from the other virtual machines that I want to be part of the same swarm, it comes up to light that the inbuilt load balancer will not route requests to the Docker containers which are deployed on the worker nodes. All the load will be against the manager node, unless the --advertise-addr value parameter is used in addition to the command above, as per the example below:

docker swarm join --token SWMTKN-1-TOKEN-ID x.x.x.x:2377 --advertise-addr y.y.y.y

where y.y.y.y is the IPv4 address of the worker node that is joining the swarm manager, identified by the x.x.x.x IPv4 address.

Steps to reproduce the behavior

Output of docker version:

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:53:11 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 30
 Running: 30
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 18.09.3
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: rusnx785ap13vl5m747w5hfe1
 Is Manager: true
 ClusterID: rd1y0gvy8ga1k8mzt2eqfaukk
 Managers: 1
 Nodes: 2
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 194.149.138.196
 Manager Addresses:
  194.149.138.196:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-46-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.852GiB
Name: vlab-docker-01
ID: 6UVS:MZKH:ILOG:XPOF:ECKI:24WQ:7YPP:KYFH:FRYM:NNHI:XUKI:CV6Z
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.) The virtual machines that I am currently using are spread around the world on OpenNebula and Microsoft Azure cloud providers. I've got machines on all the continents which should be part of the same Docker swarm. All of the virtual machines are running under Ubuntu 18.04 LTS (Bionic). These machines are running on a single network interface (eth0) using single IPv4 public IP address per machine.

As per the docker swarm join command documentation here, it does not state that the --advertise-addr value parameter is mandatory when using single network interface with single IPv4 address. As a matter of fact, when I run the docker swarm join command without this parameter, the virtual machines will join the swarm with no errors displayed and the containers will also get deployed successfully. But then, when I will try and simulate load of my containers through Apache jMeter, where the destination IP address of my test cases is the public IP address of the swarm manager node, all the requests will only hit the containers that are running on the manager node. I've been wondering why this would be the case and then I tried to join the swarm node but having each worker node advertise it's own public IP address - the swarm load balancing works like a charm and the load is distributed among all the docker containers with are part of the swarm (not only the ones running under the swarm manager).

I find this really confusing, because I would expect that I would need to advertise my address in cases when I am using multiple network interfaces, having multiple public IP addresses (for example, IPv4 and IPv6). In such scenarios, would have expected that the Docker Swarm is confused on which IP address to be used.

Below I will be sending the complete Docker Compose file I have been using:

version: '3.3'
services:
  vlab-web-service:
    image: 'dejanstamenov/vlab-docker:experimental'
    ports:
      - target: 8080
        published: 8080
        protocol: tcp
        mode: ingress
    command: nohup /app/vlab-docker-dotnet-core-stream-app 8080 &
    deploy:
      mode: replicated
      replicas: 30
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 30s
      endpoint_mode: vip
    networks:
      - vlab-swarm-network
networks:
  vlab-swarm-network:
    driver: overlay
    ipam:
      driver: default
      config:
        - subnet: 10.0.37.0/24
    driver_opts:
      encrypted: "false"
      "com.docker.network.driver.mtu" : "9216"
      "com.docker.network.bridge.enable_ip_masquerade" : "false"

Hope that I've not missed any important information. If something is confusing or need further clarification, please do let me know, happy to provide any feedback.

Thank you.

iwalucas commented 5 years ago

Any update on this?

arkodg commented 5 years ago

Hi @iwalucas and @dejanstamenov

I'm not seeing this issue, here is what I did on my AWS VMs

Created one manager
```
~$ docker swarm init
```
and one worker without --advertise-addr

~$ docker swarm join --token SWMTKN x.x.x.x:2377
This node joined a swarm as a worker.

Created a service with 2 replicas using docker stack


~$ cat compose.yaml 
version: "3.7"
services:
swarm-app:
environment:
  - VERSION=v4
  - METADATA=swarm-dev
image: ehazlett/docker-demo
deploy:
  mode: replicated
  replicas: 2
  endpoint_mode: vip
ports:
- "5050:8080"

and

`docker stack deploy -c compose.yaml demo`

~$ docker service ls ID NAME MODE REPLICAS IMAGE PORTS tkxhdqkimy8i demo_swarm-app replicated 2/2 ehazlett/docker-demo:latest *:5050->8080/tcp


Each container is running on a different node and can be verified via

docker service ps demo_swarm-app


3. Tried to `curl` multiple times and I am able to hit every backend container

for i in range{1..5}; do curl http://0.0.0.0:5050/ping;done {"instance":"7e7002e2f20c","version":"v4","metadata":"swarm-dev"} {"instance":"c0bbff2b1b1e","version":"v4","metadata":"swarm-dev"} {"instance":"7e7002e2f20c","version":"v4","metadata":"swarm-dev"} {"instance":"c0bbff2b1b1e","version":"v4","metadata":"swarm-dev"} {"instance":"7e7002e2f20c","version":"v4","metadata":"swarm-dev"}


A good place to start would be to check the entries in the load-balancer on the host (part of the `ingress` network) where you are sending your requests to via

sudo nsenter --net=/var/run/docker/netns/ingress_sbox ipvsadm -l -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 259 rr -> 10.255.0.5:0 Masq 1 0 0
-> 10.255.0.6:0 Masq 1 0 0



and each IP should point to the load-balancer endpoint on each node connected to the custom overlay network

xperjon commented 4 years ago

I had the same experience as @dejanstamenov on DigitalOcean with 3 Ubuntu 18:04.3 machines running Docker 19.03.4 . One manager and two workers, running a service with 2 replicas. Tasks where deployed to node1 (manager) and node2 (worker). Node3 (worker) wasn't responding until it left the swarm and rejoined using the --advertise-addr flag.

arkodg commented 4 years ago

@xperjon that would make sense if Node3 is behind a NAT

dejanstamenov commented 4 years ago

Hi @arkodg,

Apologies for the delayed response on this one. I've just went ahead and created new service without using the --advertise-addr parameter and have received the below errors.

This is my service:

ID                  NAME                          MODE                REPLICAS            IMAGE                                    PORTS
w90yhr0ef3k3        vlab-stack_vlab-web-service   replicated          2/2                 dejanstamenov/vlab-docker:experimental   *:8080->8080/tcp

and when running curl just like in your example above, got the below output.

$ for i in range{1..5}; do curl http://0.0.0.0:8080/ping;done
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused
curl: (7) Failed to connect to 0.0.0.0 port 8080: Connection refused

The firewall has the port enabled.

$ ufw status
Status: active

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere
4445/tcp                   ALLOW       Anywhere
80/tcp                     ALLOW       Anywhere
8080/tcp                   ALLOW       Anywhere
2377/tcp                   ALLOW       Anywhere
7946/tcp                   ALLOW       Anywhere
7946/udp                   ALLOW       Anywhere
4789/udp                   ALLOW       Anywhere
500/udp                    ALLOW       Anywhere
4500/udp                   ALLOW       Anywhere
22/tcp (v6)                ALLOW       Anywhere (v6)
4445/tcp (v6)              ALLOW       Anywhere (v6)
80/tcp (v6)                ALLOW       Anywhere (v6)
8080/tcp (v6)              ALLOW       Anywhere (v6)
2377/tcp (v6)              ALLOW       Anywhere (v6)
7946/tcp (v6)              ALLOW       Anywhere (v6)
7946/udp (v6)              ALLOW       Anywhere (v6)
4789/udp (v6)              ALLOW       Anywhere (v6)
500/udp (v6)               ALLOW       Anywhere (v6)
4500/udp (v6)              ALLOW       Anywhere (v6)

Have also tried running the nsenter command and got the below output.

$ nsenter --net=/var/run/docker/netns/ingress_sbox ipvsadm -l -n
nsenter: failed to execute ipvsadm: No such file or directory

Not really sure why I am getting this error on the ipvsadm, just looking into the details.

Thanks!

arkodg commented 4 years ago

@dejanstamenov are you executing that command on the master node ?

dejanstamenov commented 4 years ago

@arkodg - Yes, although failed to mention that. All the snippets from yesterday are from the master node.

arkodg commented 4 years ago

@dejanstamenov from the first comment it looks like you might have been behind a NAT so the primary iface IP was not reachable, which is why you needed to manually specify the --advertise-addr and from the last comment it looks like your master node does not have the ingress sbox (dockerd logs would help for this case and repro steps as well)

docker / for-linux