TritonDataCenter / containerpilot

A service for autodiscovery and configuration of applications running in containers
Mozilla Public License 2.0
1.13k stars 135 forks source link

Not the same subnet for Consul and the other services #502

Closed lucj closed 7 years ago

lucj commented 7 years ago

Hi, I have a micro-services application, each one using ContainerPilot 3.3.0. Below is an excerpt of the Compose file I'm using (only consul and api are listed for readability):


version: '3.3'
services:
  consul:
    image: autopilotpattern/consul
    command: /usr/local/bin/containerpilot
    dns:
      - 127.0.0.1
    environment:
      - CONSUL_DEV=1
    networks:
      - mynet
    ports:
      - "8500:8500"

  api:
    image: myorg/api:develop
    command: ["containerpilot"]
    networks:
      - mynet

When I run the application (not in Triton in this case), I saw that the Consul and the API service do not use the same subnet.

Check within the Consul container (based on autopilotpattern/consul:latest)


docker exec -ti 6d0e090708b1 sh
/ # ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
...
15001: eth0@if15002:  mtu 1450 qdisc noqueue state UP 
    link/ether 02:42:0a:ff:00:0e brd ff:ff:ff:ff:ff:ff
    inet 10.255.0.14/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.255.0.13/32 scope global eth0
       valid_lft forever preferred_lft forever
15003: eth1@if15004:  mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:12:00:05 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.5/16 scope global eth1
       valid_lft forever preferred_lft forever
15005: eth2@if15006:  mtu 1450 qdisc noqueue state UP 
    link/ether 02:42:0a:00:01:10 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.16/24 scope global eth2
       valid_lft forever preferred_lft forever
    inet 10.0.1.15/32 scope global eth2
       valid_lft forever preferred_lft forever

I'm not really sure about the origin of the 10.255.x.x on eth0. eth1 is the IP on the Docker0 bridge and eth2 is the IP on a subnet created with an overlay network beforehand (mynet).

Looking at the env set by ContainerPilot:


/ # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/local/bin/containerpilot
   18 root       0:00 /bin/consul agent -dev -config-dir=/etc/consul
  436 root       0:00 sh
  611 root       0:00 ps aux
/ # cat /proc/18/environ | tr \\0 \\n -
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=6d0e090708b1
CONSUL_DEV=1
CONSUL_VERSION=0.7.3
CONTAINERPILOT_VER=3.0.0
CONTAINERPILOT=/etc/containerpilot.json5
SHELL=/bin/bash
HOME=/root
CONTAINERPILOT_PID=1
CONTAINERPILOT_CONSUL_IP=10.255.0.13

=> IP is 10.255.0.13 (selected as this is the first private IP ?). I would have prefered one IP on eth2 :)

Check within the API container (image using ContainerPilot)


~ $ docker exec -ti ff615bfa619f sh
/app # ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
...
14981: eth0@if14982:  mtu 1450 qdisc noqueue state UP 
    link/ether 02:42:0a:00:01:04 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.4/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.0.1.3/32 scope global eth0
       valid_lft forever preferred_lft forever
14983: eth1@if14984:  mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:12:00:04 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.4/16 scope global eth1
       valid_lft forever preferred_lft forever

I do not get why I have eth0, eth1 and eth2 in the Consul container where there are only eth0 and eth1 for the API container.

Looking at the env set by ContainerPilot:


/app # ps aux 
PID   USER     TIME   COMMAND
    1 root       0:00 containerpilot
    8 root       0:00 containerpilot
   13 root       0:00 {manage.sh} /bin/sh /app/manage.sh prestart
  387 root       0:00 sh
  396 root       0:00 /usr/local/bin/consul agent -data-dir=/data -config-dir=/config -log-level=err -rejoin -retry-join consul -retry-max 10 -retry-interval 10s
  465 root       0:00 sleep 5
  466 root       0:00 ps aux
/app # cat /proc/13/environ | tr \\0 \\n –
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=ff615bfa619f
VERSION=v6.9.4
NPM_VERSION=3
LAST_UPDATED=20170515T152500
CONTAINERPILOT_VER=3.3.0
CONTAINERPILOT=/etc/containerpilot.json5
HOME=/root
CONTAINERPILOT_PID=8
CONTAINERPILOT_API_IP=10.0.1.3
CONTAINERPILOT_CONTAINERPILOT_IP=10.0.1.3

=> the IP is 10.0.1.3, so it cannot contact Consul.

As the net interfaces are not the same for consul and the api services, the first private IP used is not in the same range. Any idea what I'm missing here ?

Should I use additional configuration for each service so it first start to check IP on 192.168.0.0/16 (to be deployed on Triton) and then checks 10.0.1.0/24 network (in case it's deployed outside of Triton, for instance in the case of an external overlay net) ?

tgross commented 7 years ago

I do not get why I have eth0, eth1 and eth2 in the Consul container where there are only eth0 and eth1 for the API container.

I'm not sure either, but that most likely is related to the networking configuration of Docker wherever you're deploying. Normally I'd recommend that you use a more specific interfaces configuration for each job, so that you're picking eth0 from one and eth2 from the other, for example, so they both bind to the IPs on the same subnet.

But if you're deploying both containers onto the same Docker host and getting different network interfaces, I suspect we're missing some information we need here. What does the network environment on the host look like? Does one of the elided Compose services have something different about its network?

lucj commented 7 years ago

@tgross, the diff in the interfaces is due to the fact consul service exposes a port so it's also on the Swarm ingress network. With the consul server, I specify the interface in the -bind option so it gets the correct one.


consul:
    image: consul:0.9.2
    command: agent -server -client=0.0.0.0 -bootstrap -ui -bind '{{ GetInterfaceIP "eth2"  }}'
    ...