Open mRoca opened 6 years ago
Hi @mRoca thanks for the detailed report and analysis. I think an alternative to "burning" the address maybe to configure the docker_gwbridge
to a certain subnet as mentioned in the docs: https://docs.docker.com/engine/swarm/networking/#customize-the-docker_gwbridge
Will that help with your configuration?
We can bubble up the bip
and docker_gwbridge
subnets as parameters in the template.
If I create the docker_gwbridge
before the swarm init by setting my PRE_INIT_SCRIPT
parameter to
docker network create --subnet 172.19.0.0/16 --opt com.docker.network.bridge.name=docker_gwbridge --opt com.docker.network.bridge.enable_icc=false --opt com.docker.network.bridge.enable_ip_masquerade=true docker_gwbridge
it will do the job until we create another network.
At the first docker network create foo
, the docker daemon will get the first available /16 ip range as subnet. Here, the foo
network will have the 172.17.0.0/16
subnet. When a newtork subnet is declared, all packets to the corresponding ip range will be routed internally. It's so the same situation (except if we specify ALL new network subnets) :(
For the subsequent, docker network create
commands, you can specify the subnet and various other options (if the defaults do not work for your environment/scenario) as documented here: https://docs.docker.com/engine/reference/commandline/network_create/#specify-advanced-options.
Does that help with avoiding the conflict?
That's mean the application/stack (which describes the docker-compose.yaml configuration for instance) knows the infrastructure (to avoid conflicted ip range), which IMHO should not be it responsibility. Moreover 2 versions of the same stack can't be deployed in the same swarm without keeping a referential to know which IP is burned or not.
The solution may help to avoid the conflict, but sounds like a big hack and will be really hard to maintain. Same for IP in containers, most of the time we shouldn't have to define a fix IP, we just let the infrastructre/docker pick one for us.
Expected behavior
Be able to communicate between an AWS Docker Swarm stack and a
172.17.0.0/16
VPC.Actual behavior
Situation:
vpc_a
) has a 172.17.0.0/16 CIDRvpc_d
), the Docker Swarm one, has a 10.3.0.0/16 CIDRvpc_a
and thevpc_d
VPCs, and thevpc_d
can access thevpc_a
onesvpc_d
with the default CloudFormation configuration for existing vpcThe problem : it's impossible to access a
vpc_a
(172.17.xxx.xxx) ip from a swarm node.By default, a new docker network has a
172.xxx.0.1/16
subnet, wherexxx
is the first available range after 17. When a swarm node (or a manager) is created, the docker engine creates a defaultbridge
network. By default, this network's subnet is172.17.0.1/16
. During the swarm install adocker_gwbridge
bridge network is created by the docker4x/init-aws container. By default, this network's subnet is172.18.0.1/16
. When we create a new docker network, the subnet is then172.19.0.1/16
.When a docker container tries to access a
172.17.xxx.xxx
ip, as the host has the172.17.0.0/16 dev 17 src 172.17.0.1
ip route, the packets will newer leave the docker network.Some solutions
The first and ugly solution: add a new proxy instance
In order to avoid the CloudFormation template or the Docker Swarm AMI update, it's possible to create a new "proxy" instance in the
vpc_d
vpc and to use its 10.3.xxx.xxx ip address instead of the truevpc_a
one. The proxy can be created with a simple iptables rule :iptables -t nat -A PREROUTING -p tcp -i eth0 --dport ${port} -j DNAT --to ${vpc_a_ip}:${port}
for example.This solution is very heavy and not easy to manage.
A better solution: update the CloudFormation template
It's easy to congfigure the
bridge
docker network by adding thebip
value in the/etc/docker/daemon.json
file. This can allow to change its default172.17.0.1/16
subnet value. But the problem remains the same, as all the first created network (thedocker_gwbridge
one, in our case) will take this available subnet.The solution we have found is to "burn" one ip address by reserved range in the ip route table in order to avoid a docker network subnet creation on it. For example, by running the command
iproute add 172.18.255.254 dev lo;
before the swarm init, thedocker_gwbridge
network will have the172.19.0.1/16
subnet, as the172.18.0.1/16
is considered as yet used.This is a working version of the
UserData
CloudFormation template's script value :With the following new CloudFormation template parameters :
Here, we must add the ip route AND specify the bip value because at this time the docker network has yet been created.
With this solution, as the
172.17.0.0/16
range is burn by the172.17.255.254 dev lo;
route, this range is no longer available for docker networks. Thebip
value allows to change the defaultbridge
subnet value after the docker service restart.The best solution: update the template and the docker4x images
It would be really useful to be able to choose a global docker networks addressing range (as
10.128.0.0/9
for example), or to reserve some subnets in the CloudFormation template.Do you have another way to fix the problem ?