Open owenmorgan opened 7 years ago
might be worth noting that when i ssh into one of the masters i receive this message.
Failed Units: 3 docker.service setup-network-environment.service docker.socket
hey @owenmorgan what does systemd docker logs say?
I see the same thing, as well, workers look like:
Failed Units: 2
docker.service
setup-network-environment.service
@enxebre here are the systemd docker.service logs for a master
core@ip-10-0-1-61 ~ $ journalctl --unit=docker.service | cat
[snip until reboot]
-- Reboot --
Nov 17 20:45:22 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:4] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:22 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:5] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:29 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:4] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:29 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:5] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:30 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:4] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:30 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:5] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:35 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: Started Docker Application Co
ntainer Engine.
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal dockerd[1028]: dockerd: "dockerd" require
s 0 arguments.
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal dockerd[1028]: Usage: dockerd [OPT
IONS]
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: docker.service: Main process
exited, code=exited, status=1/FAILURE
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: docker.service: Unit entered
failed state.
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: docker.service: Failed with r
esult 'exit-code'.
[snip]
https://github.com/Capgemini/kubeform/blob/master/terraform/aws/public-cloud/master-cloud-config.yml.tpl#L46 seems wrong and duplicated.
restart=always
should be inside [service]
, probably same for the others cloud-config
I had similar issues on DO.
I get less ssh timeout issues adding the following codes to my .ssh/config file
Host kube-*
Port 22
User core
StrictHostKeyChecking=no
UserKnownHostsFile=/dev/null
Later on I had some more ansible-playbook errors due to Out of Memory in some steps, and some of the candidates to be killed where kube api server and others. I solved instancing 4gb machines on DO.
When I get to running
It runs through fine until it waits for the kube-apiserver task. It will time out.
I have checked on one of the masters. and see this
Any ideas?
Thanks