brooklyncentral / clocker

Apache Brooklyn cloud native infrastructure blueprints
Apache License 2.0
428 stars 66 forks source link

Docker fails to create new bridge until unused old bridges are removed (calico) #295

Closed johnmccabe closed 8 years ago

johnmccabe commented 8 years ago

Looks like some sort of hard limit on the number of bridges that can be created?

root@brooklyn-o6lhzh-johnmcca-slams-m4ew-docker-ho-oedc-80e:~# docker network create --driver bridge -o com.docker.network.bridge.enable_ip_masquerade=true -o com.docker.network.bridge.host_binding_ipv4=0.0.0.0 PHG8z0ba_bridge
Error response from daemon: failed to parse pool request for address space "LocalDefault" pool "" subpool "": could not find an available predefined network

At this point this failed I had 17 calico, 31 bridge, 1 host and 1 null networks.

root@brooklyn-o6lhzh-johnmcca-slams-m4ew-docker-ho-oedc-80e:~# docker network ls
NETWORK ID          NAME                DRIVER
0ec117270795        EISpVUGk            calico
10c51951b6dd        oOikC873            calico
137f4790b79e        yA5QvrdK            calico
4a06d229a776        egYsnXkJ            calico
4f41c8ea950c        aReBMbDX            calico
5ba8f0532017        pXL4H0c0            calico
691319b26df7        VcFGyRJr            calico
6da7d641ef29        TocyqRot            calico
8424cf8097f2        q6VfIoQi            calico
8ba1579ae714        GsJXbHRD            calico
9a576e8d30b6        YIm1X53h            calico
ac22283962a1        kn14R1mi            calico
aef6b00b857f        UWHiFKi3            calico
c8ce7d3cb68b        fSHh68ej            calico
e6376e5a1512        vWmdZ5SJ            calico
f466bd90e050        rWwu0had            calico
ffd7cc8f7c4a        w4qPRiwD            calico
8b51808d8576        fSHh68ej_bridge     bridge
14a49a808d56        yA5QvrdK_bridge     bridge
90155e3294d8        q6VfIoQi_bridge     bridge
ff6e66552671        host                host
4a01dfe4d9f0        rWwu0had_bridge     bridge
1109e7d652b7        fSHh68ej_bridge     bridge
ebcc78f11827        EISpVUGk_bridge     bridge
2a0e084d874b        UWHiFKi3_bridge     bridge
df5d67a80d1b        UWHiFKi3_bridge     bridge
98df83dce083        yA5QvrdK_bridge     bridge
bd2fd04f971c        w4qPRiwD_bridge     bridge
56723f32309f        kn14R1mi_bridge     bridge
bb501bcad788        TocyqRot_bridge     bridge
3dfa64e29864        aReBMbDX_bridge     bridge
dd7cb0920222        YIm1X53h_bridge     bridge
3480ce6a2ecb        q6VfIoQi_bridge     bridge
32c147fb6b2c        w4qPRiwD_bridge     bridge
df700cd00eb6        GsJXbHRD_bridge     bridge
0483c1b12b6d        vWmdZ5SJ_bridge     bridge
7ecf2ed7ebe8        kn14R1mi_bridge     bridge
890f66c47774        TocyqRot_bridge     bridge
6ddd9ac228c6        egYsnXkJ_bridge     bridge
76d01c73b4bc        pXL4H0c0_bridge     bridge
9a7eb3b52ff8        ElnZNUcs_bridge     bridge
d28b49aed06b        VcFGyRJr_bridge     bridge
0ba4ebddcfb9        rWwu0had_bridge     bridge
ddffc00e969d        VcFGyRJr_bridge     bridge
60d226826e1b        GsJXbHRD_bridge     bridge
8eb688b4a4b4        EISpVUGk_bridge     bridge
249994d60691        oOikC873_bridge     bridge
b227bb5ce079        YIm1X53h_bridge     bridge
3d1012717388        bridge              bridge
f78463a1b091        none                null

Deleting some unused networks and repeating the command was successful.

This is on Softlayer with a docker with calico 1.1.10-SNAPSHOT from a few days ago.

name: slams
location: softlayer
services:
  - type: 'docker-cloud-calico:1.1.0-SNAPSHOT'
    brooklyn.config:
      docker.host.cluster.initial.size: 1
      docker.registry.start: false
      docker.version: 1.10.3

Is there an upper limit on the number of networks that can be provisioned, or have I missed some setup by deploying the docker cloud via the template wizard?

johnmccabe commented 8 years ago

Note, I had actually deleted the calico networks as well (unintentionally) before reattempting the command docker network rm $(docker network ls -q).

The error in the Clocker ui was (no streams accessible).

2 of 2 parallel child tasks failed; 2 errors including:
Error invoking start at SameServerEntityImpl{id=br7mxuON}:
Error invoking start at DockerContainerImpl{id=Hauc4JM0}:
SSH task ended with exit code 1 when 0 was required, in Task[ssh: 
( if test "$UID" -eq 0; then ( docker network create --driver bridge -o 
com.docker.network.bridge.enable_ip_masquerade=true -o 
com.docker.network.bridge.host_binding_ipv4=0.0.0.0 PHG8z0ba_bridge ); else sudo -E -n -S -- docker 
network create --driver bridge -o com.docker.network.bridge.enable_ip_masquerade=true -o 
com.docker.network.bridge.host_binding_ipv4=0.0.0.0 PHG8z0ba_bridge; fi )]@W23BxmGF: ( if test 
"$UID" -eq 0; then ( docker network create --driver bridge -o 
com.docker.network.bridge.enable_ip_masquerade=true -o 
com.docker.network.bridge.host_binding_ipv4=0.0.0.0 PHG8z0ba_bridge ); else sudo -E -n -S -- docker 
network create --driver bridge -o com.docker.network.bridge.enable_ip_masquerade=true -o 
com.docker.network.bridge.host_binding_ipv4=0.0.0.0 PHG8z0ba_bridge; fi )
grkvlt commented 8 years ago

Clocker 1.1.0 release ought to have fixed this, I believe? Will do some further testing.

grkvlt commented 8 years ago

The fact that it was 31 bridge networks (i.e. 2^n-1 where n is 5) makes me suspicious. I wonder if this is an underlying OS limit or a Docker limit? Would be disappointing to not be able to have more than 32 separate application containers (and therefore bridge networks, for isolation) per host.

grkvlt commented 8 years ago

@johnmccabe Better network removal has been implemented in #296 which will mitigate this issue

johnmccabe commented 8 years ago

Thanks @grkvlt I'm seeing the network deletion working as expected now.