Closed jon-shanks closed 9 years ago
So to add, it looks like flannel runs AFTER the target as oppose before anyway, and should then notify the early-docker target once it has run but the logs show that this isn't happening. Though this doesn't explain really to me why docker wouldn't start but the behavior still seems wrong to me?
would it not make sense just to make flannel a binary on coreos built into the image? This seems like a lot of bits of string to make things run in a kind of ordered way, (that isn't working) but i can see could quite easily break for a lot of people due to the flexibility of cloud-init and the cloud-config, people make do things or restart docker or whatever else breaking flannel very easily and preventing things working.
OK you can ignore the docker.service part. It isn't enabled, it works when this is in the cloud-config though as a unit start . However it's good to note that the order matters in the unit start i.e. it has to be AFTER flanneld start
-- Logs begin at Thu 2015-07-09 21:31:54 UTC, end at Fri 2015-07-10 15:15:32 UTC. --
Jul 10 03:15:59 cimgtcorshnode01 systemd[1]: Reached target Run Docker containers before main Docker starts up.
Jul 10 03:15:59 cimgtcorshnode01 systemd[1]: Starting Run Docker containers before main Docker starts up.
Jul 10 03:16:00 cimgtcorshnode01 systemd[1]: Started Early Docker Application Container Engine.
Jul 10 03:16:00 cimgtcorshnode01 systemd[1]: Starting Early Docker Application Container Engine...
Jul 10 03:16:00 cimgtcorshnode01 systemd[1]: Starting Network fabric for containers...
Jul 10 03:16:00 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:00Z" level=info msg="+job serveapi(fd://)"
Jul 10 03:16:00 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:00Z" level=info msg="Listening for HTTP on fd ()"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="Loading containers: start."
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="Loading containers: done."
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="docker daemon: 1.6.2 7c8fca2-dirty; execdriver: native-0.2; graphdriver: overlay"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job acceptconnections()"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="-job acceptconnections() = OK (0)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="Daemon has completed initialization"
Jul 10 03:16:04 cimgtcorshnode01 etcdctl[729]: { "Network": "10.102.0.0/16", "Backend" : { "Type" : "vxlan" } }
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="POST /v1.18/containers/create"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job create()"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job log(create, da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef, quay.io/coreos/flannel:0.4.0)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="-job log(create, da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef, quay.io/coreos/flannel:0.4.0) = OK (0)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="-job create() = OK (0)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="POST /v1.18/containers/da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef/attach?stderr=1&stdout=1&stream=1"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job container_inspect(da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="-job container_inspect(da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef) = OK (0)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job attach(da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="POST /v1.18/containers/da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef/start"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job start(da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="+job log(start, da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef, quay.io/coreos/flannel:0.4.0)"
Jul 10 03:16:04 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:04Z" level=info msg="-job log(start, da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef, quay.io/coreos/flannel:0.4.0) = OK (0)"
Jul 10 03:16:05 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:05Z" level=info msg="-job start(da0218e2199f91f7694ddfc08284163796b1066cd18c8d2aceb07f10eb0dc8ef) = OK (0)"
Jul 10 03:16:06 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:06.268817 00001 main.go:278] Installing signal handlers
Jul 10 03:16:06 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:06.268881 00001 main.go:138] Determining IP address of default interface
Jul 10 03:16:06 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:06.269352 00001 main.go:228] Using 10.70.2.2 as external interface
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.037061 00001 subnet.go:97] Subnet lease acquired: 10.102.23.0/24
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.332723 00001 ipmasq.go:47] Adding iptables rule: FLANNEL -d 10.102.0.0/16 -j ACCEPT
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.337708 00001 ipmasq.go:47] Adding iptables rule: FLANNEL ! -d 224.0.0.0/4 -j MASQUERADE
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.343025 00001 ipmasq.go:47] Adding iptables rule: POSTROUTING -s 10.102.0.0/16 -j FLANNEL
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.351752 00001 main.go:245] VXLAN mode initialized
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.351798 00001 vxlan.go:141] Watching for L2/L3 misses
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.351821 00001 vxlan.go:147] Watching for new subnet leases
Jul 10 03:16:07 cimgtcorshnode01 sdnotify-proxy[789]: I0710 03:16:07.371354 00001 vxlan.go:210] Subnet added: 10.102.42.0/24
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="POST /v1.18/containers/create"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job create()"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job log(create, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job log(create, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job create() = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="POST /v1.18/containers/254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152/attach?stderr=1&stdout=1&stream=1"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job container_inspect(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job container_inspect(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job attach(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="POST /v1.18/containers/254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152/start"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job start(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job log(start, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job log(start, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job start(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job log(die, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job log(die, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job attach(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="POST /v1.18/containers/254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152/wait"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job wait(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job wait(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="GET /v1.18/containers/254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152/json"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job container_inspect(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job container_inspect(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="DELETE /v1.18/containers/254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152?v=1"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job rm(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="+job log(destroy, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job log(destroy, 254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152, quay.io/coreos/flannel:0.4.0) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 dockerd[706]: time="2015-07-10T03:16:07Z" level=info msg="-job rm(254d6d3ce23a8f143c89a818d8028b9bd2af79dd48879c222c9505e84b211152) = OK (0)"
Jul 10 03:16:07 cimgtcorshnode01 systemd[1]: Started Network fabric for containers.
Jul 10 04:55:36 cimgtcorshnode01 sdnotify-proxy[789]: I0710 04:55:36.341713 00001 vxlan.go:210] Subnet added: 10.102.74.0/24
the behavior is still odd however with early-docker.target and flanneld.service, more logs requested, so please see above.
So closing this too:
Appears a) docker service needs a unit as it's not enabled by default in CoreOS b) systemd notify is quirky in the sense that it outputs the description of the target in journalctl saying it's "starting" but yet it isn't "started" .. making things a bit confusing and misleading when you're troubleshooting.
There is an issue with docker not coming up despite reboots etc. There seems to be some constraint with all the early-docker implementation, maybe flannel isnt' notifying as it should, so that docker starts, however, on a journactl -u docker.service it's empty and no docker is running.
I'd have presumed we wouldn't need to drop a docker start unit in? As the docker.service is already in there natively to CoreOS and requires to run after the early-docker.target . It seems to notify however from the logs so it can' tbe the notify element as the target does get instantiated, just for some reason docker.service doesn't seem to run despite that being the case?
Logs above:
[Unit] Description=Network fabric for containers Documentation=https://github.com/coreos/flannel Requires=early-docker.service After=etcd.service etcd2.service early-docker.service Before=early-docker.target
[Service] Type=notify Restart=always RestartSec=5 Environment="TMPDIR=/var/tmp/" Environment="DOCKER_HOST=unix:///var/run/early-docker.sock" Environment="FLANNEL_VER=0.4.0" Environment="ETCD_SSL_DIR=/etc/ssl/etcd" LimitNOFILE=40000 LimitNPROC=1048576 ExecStartPre=/sbin/modprobe ip_tables ExecStartPre=/usr/bin/mkdir -p /run/flannel ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR} ExecStartPre=/usr/bin/touch /run/flannel/options.env
ExecStart=/usr/libexec/sdnotify-proxy /run/flannel/sd.sock \ /usr/bin/docker run --net=host --privileged=true --rm \ --volume=/run/flannel:/run/flannel \ --env=NOTIFY_SOCKET=/run/flannel/sd.sock \ --env=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \ --env=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \ --env-file=/run/flannel/options.env \ --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro \ --volume=${ETCD_SSL_DIR}:/etc/ssl/etcd:ro \ quay.io/coreos/flannel:${FLANNEL_VER} /opt/bin/flanneld --ip-masq=true
Update docker options
ExecStartPost=/usr/bin/docker run --net=host --rm -v /run:/run \ quay.io/coreos/flannel:${FLANNEL_VER} \ /opt/bin/mk-docker-opts.sh -d /run/flannel_docker_opts.env -i
cat early-docker.* [Unit] Description=Early Docker Application Container Engine Documentation=http://docs.docker.com After=early-docker.socket Requires=early-docker.socket
[Service] Environment=TMPDIR=/var/tmp MountFlags=slave LimitNOFILE=1048576 LimitNPROC=1048576 ExecStart=/usr/lib/coreos/dockerd --daemon --host=fd:// --bridge=none --iptables=false --ip-masq=false --graph=/var/lib/early-docker --pidfile=/var/run/early-docker.pid
[Install] WantedBy=early-docker.target [Unit] Description=Early Docker Socket for the API
[Socket] ListenStream=/var/run/early-docker.sock
[Install] WantedBy=sockets.target [Unit] Description=Run Docker containers before main Docker starts up
cat docker.service [Unit] Description=Docker Application Container Engine Documentation=http://docs.docker.com After=docker.socket early-docker.target network.target Requires=docker.socket early-docker.target
[Service] Environment=TMPDIR=/var/tmp EnvironmentFile=-/run/flannel_docker_opts.env MountFlags=slave LimitNOFILE=1048576 LimitNPROC=1048576 ExecStart=/usr/lib/coreos/dockerd --daemon --host=fd:// $DOCKER_OPTS $DOCKER_OPT_BIP $DOCKER_OPT_MTU $DOCKER_OPT_IPMASQ
[Install] WantedBy=multi-user.target