Using clocker 1.2.0-SNAPSHOT (at commit 7c9346c, while testing a couple of unrelated fixes for issues #288 and #290)...
I successfully deployed a 2 host clocker+calico cluster in BlueBox. I then deployed many entities that created containers (using Brooklyn's MachineEntity) to cause the cluster to auto-scale.
It create a third host, but this hung on startup (waiting forever for post-start to finish). It is waiting for SdnAgent agent = Entities.attributeSupplierWhenReady(this, SdnAgent.SDN_AGENT).get();.
Looking at the CalicoNode for that host, its service.state is "ON_FIRE" and its service.isUp is "false". Looking in the debug log (grep -E "OKsRTXuY|10.101.1.162"), I see the following error:
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Pulling Docker image calico/node:v0.19.0
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Running Docker container with the following command:
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] docker run -d --restart=always --net=host --privileged --name=calico-node -e HOSTNAME=brooklyn-o6o7oy-aled-clocker-bl-fgdo-docker-host-hhfw-bb3 -e
IP=10.101.1.162 -e IP6= -e CALICO_NETWORKING=true -e AS= -e NO_DEFAULT_POOLS= -e ETCD_AUTHORITY=10.101.1.162:2379 -e ETCD_SCHEME=http -v /var/log/calico:/var/log/calico -v /lib/modules:/lib/modules -v /var/run/calico:/var/run/calico ca
lico/node:v0.19.0
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Calico node is running with id: 06dc7cbec5c7241fbdf0dec2cecce312908f7ce90224e90844b5a494765b6b1c
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Waiting for successful startup
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Traceback (most recent call last):
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "startup.py", line 295, in <module>
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] main()
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "startup.py", line 285, in main
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] _ensure_host_tunnel_addr(ipv4_pools, ipip_pools)
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "startup.py", line 55, in _ensure_host_tunnel_addr
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] _assign_host_tunnel_addr(ipip_pools)
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "startup.py", line 74, in _assign_host_tunnel_addr
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] host=hostname
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 128, in wrapped
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] return fn(*args, **kwargs)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "/usr/lib/python2.7/site-packages/pycalico/ipam.py", line 618, in auto_assign_ips
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] pool[0], host)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [brooklyn-execmanager-FnS0lXyr-1063]: launching CalicoNodeImpl{id=OKsRTXuY}, on machine SshMachineLocation[10.101.1.162:aled@10.101.1.162/10.101.1.162:22(id=hKrZGyax)], completed: return status
0
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "/usr/lib/python2.7/site-packages/pycalico/ipam.py", line 723, in _auto_assign
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] ipam_config)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] File "/usr/lib/python2.7/site-packages/pycalico/ipam.py", line 189, in _new_affine_block
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] "wrong attributes" % pool)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] pycalico.datastore_errors.PoolNotFound: Requested pool 50.0.3.0/24 is not configured or haswrong attributes
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Calico node failed to start
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Pulling Docker image calico/node-libnetwork:v0.8.0
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Calico libnetwork driver is running with id: dc8372dbd5e8e821dfc102f1d6e89c1384592870cd0766316d365bbae496ae1d
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Executed /tmp/brooklyn-20160504-220243832-D1kk-launching_CalicoNodeImpl_id_OK.sh, result 0
It then goes on to repeatedly fail the check-running for CalicoNodeImpl{id=OKsRTXuY}.
2016-05-04 22:05:14,762 DEBUG brooklyn.SSH [brooklyn-execmanager-FnS0lXyr-1348]: check-running CalicoNodeImpl{id=OKsRTXuY}, on machine SshMachineLocation[10.101.1.162:aled@10.101.1.162/10.101.1.162:22(id=hKrZGyax)], completed: return status 1
2016-05-04 22:05:14,762 DEBUG brooklyn.SSH [Thread-33364]: [OKsRTXuY@10.101.1.162:stdout] calico-node container not running
2016-05-04 22:05:14,762 DEBUG brooklyn.SSH [Thread-33364]: [OKsRTXuY@10.101.1.162:stdout] Executed /tmp/brooklyn-20160504-220514011-ZJFA-check-running_CalicoNodeImpl_i.sh, result 1
Using clocker 1.2.0-SNAPSHOT (at commit 7c9346c, while testing a couple of unrelated fixes for issues #288 and #290)...
I successfully deployed a 2 host clocker+calico cluster in BlueBox. I then deployed many entities that created containers (using Brooklyn's
MachineEntity
) to cause the cluster to auto-scale.It create a third host, but this hung on startup (waiting forever for post-start to finish). It is waiting for
SdnAgent agent = Entities.attributeSupplierWhenReady(this, SdnAgent.SDN_AGENT).get();
.Looking at the CalicoNode for that host, its service.state is "ON_FIRE" and its service.isUp is "false". Looking in the debug log (
grep -E "OKsRTXuY|10.101.1.162"
), I see the following error:It then goes on to repeatedly fail the check-running for
CalicoNodeImpl{id=OKsRTXuY}
.