eparis / kubernetes-ansible

Ansible playbooks to build a kubernetes cluster from scratch
337 stars 135 forks source link

Load minion definition into masters failure #25

Open vnugent opened 9 years ago

vnugent commented 9 years ago

Master: Fedora 21 Minion: Fedora 21 Atomic

TASK: [master | Enable scheduler] *********************************************
ok: [172.18.17.3]

TASK: [master | Copy v1beta3 style minion definitions to master] **************
ok: [172.18.17.3] => (item=172.18.17.18)

TASK: [master | Copy old v1beta1 style minion definitions to master] **********
skipping: [172.18.17.3] => (item=172.18.17.18)

TASK: [master | Load minion definition into masters] **************************
failed: [172.18.17.3] => (item=172.18.17.18) => {"changed": false, "cmd": ["/usr/bin/kubectl", "create", "-f", "/tmp/node-172.18.17.18.json"], "delta": "0:00:12.262144", "end": "2015-05-21 19:27:51.591692", "failed": true, "failed_when_result": true, "item": "172.18.17.18", "rc": 1, "start": "2015-05-21 19:27:39.329548", "stdout_lines": [], "warnings": []}
stderr: Error: 501: All the given peers are not reachable (failed to propose on members [http://172.18.17.3:4001] twice [last error: Unexpected HTTP status code]) [0]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/setup.retry

172.18.17.18               : ok=9    changed=0    unreachable=0    failed=0
172.18.17.3                : ok=26   changed=0    unreachable=0    failed=1

retried manually

[root@kmaster kubernetes-ansible]# /usr/bin/kubectl create -f /tmp/node-172.18.17.18.json
Error: 501: All the given peers are not reachable (failed to propose on members [http://172.18.17.3:4001] twice [last error: Unexpected HTTP status code]) [0]
[root@kmaster kubernetes-ansible]# curl http://172.18.17.3:4001
404 page not found
[root@kmaster kubernetes-ansible]# etcd -version
etcd version 2.0.9
gouyang commented 9 years ago

I was using the same hosts, master is Fedora21, minion is Fedora 21 Atomic, could not reproduce this problem. I guess it's your environment issue.

peterlamar commented 9 years ago

Host is Fedora21, Minion is Fedora 21. Encountering the same issue on an openstack install. When I run the script installing flannel it hangs when trying to run it as well.

TASK: [master | Copy v1beta3 style minion definitions to master] ****** ok: [173.39.214.135] => (item=173.39.214.146) ok: [173.39.214.135] => (item=173.39.214.150)

TASK: [master | Copy old v1beta1 style minion definitions to master] ****** skipping: [173.39.214.135] => (item=173.39.214.146) skipping: [173.39.214.135] => (item=173.39.214.150)

TASK: [master | Load minion definition into masters] ****** failed: [173.39.214.135] => (item=173.39.214.146) => {"changed": false, "cmd": ["/usr/bin/kubectl", "create", "-f", "/tmp/node-173.39.214.146.json"], "delta": "0:00:12.243381", "end": "2015-05-29 23:42:00.461505", "failed": true, "failed_when_result": true, "item": "173.39.214.146", "rc": 1, "start": "2015-05-29 23:41:48.218124", "stdout_lines": [], "warnings": []} stderr: Error: 501: All the given peers are not reachable (failed to propose on members [http://173.39.214.135:4001] twice [last error: Unexpected HTTP status code]) [0] failed: [173.39.214.135] => (item=173.39.214.150) => {"changed": false, "cmd": ["/usr/bin/kubectl", "create", "-f", "/tmp/node-173.39.214.150.json"], "delta": "0:00:12.243127", "end": "2015-05-29 23:42:13.004583", "failed": true, "failed_when_result": true, "item": "173.39.214.150", "rc": 1, "start": "2015-05-29 23:42:00.761456", "stdout_lines": [], "warnings": []} stderr: Error: 501: All the given peers are not reachable (failed to propose on members [http://173.39.214.135:4001] twice [last error: Unexpected HTTP status code]) [0]

FATAL: all hosts have already failed -- aborting

PLAY RECAP **** to retry, use: --limit @/root/setup.retry

173.39.214.135 : ok=26 changed=0 unreachable=0 failed=1
173.39.214.146 : ok=8 changed=0 unreachable=0 failed=0
173.39.214.150 : ok=8 changed=0 unreachable=0 failed=0

eparis commented 9 years ago

Is etcd running? I just pushed an update to fix the problem where etcd 2.0.11 refused to start with the config we were supplying. Hopefully you can just update your git repo and rerun the setup.

peterlamar commented 9 years ago

Its not.. Also, its not with the new changes. What would you suggest? Start over and install everything manually? It could be an odd issue with the Openstack I am using and I am open to any advice discovering it.

eparis commented 9 years ago

try to start etcd systemctl start etcd see if it is running ps -ef | grep etcd collect the log journalctl -b -u etcd

or try running etcd by hand an see what it say /usr/bin/etcd

peterlamar commented 9 years ago

I get a bunch of these.. something odd is going on

May 31 02:26:23 kmaster.novalocal etcd[600]: 2015/05/31 02:26:23 etcdserver: publish error: etcdserver: request timed out May 31 02:26:28 kmaster.novalocal etcd[600]: 2015/05/31 02:26:28 etcdserver: publish error: etcdserver: request timed out May 31 02:26:33 kmaster.novalocal etcd[600]: 2015/05/31 02:26:33 etcdserver: publish error: etcdserver: request timed out May 31 02:26:38 kmaster.novalocal etcd[600]: 2015/05/31 02:26:38 etcdserver: publish error: etcdserver: request timed out May 31 02:26:43 kmaster.novalocal etcd[600]: 2015/05/31 02:26:43 etcdserver: publish error: etcdserver: request timed out May 31 02:26:49 kmaster.novalocal etcd[600]: 2015/05/31 02:26:48 etcdserver: publish error: etcdserver: request timed out May 31 02:26:53 kmaster.novalocal etcd[600]: 2015/05/31 02:26:53 etcdserver: publish error: etcdserver: request timed out May 31 02:26:58 kmaster.novalocal etcd[600]: 2015/05/31 02:26:58 etcdserver: publish error: etcdserver: request timed out May 31 02:27:03 kmaster.novalocal etcd[600]: 2015/05/31 02:27:03 etcdserver: publish error: etcdserver: request timed out May 31 02:27:08 kmaster.novalocal etcd[600]: 2015/05/31 02:27:08 etcdserver: publish error: etcdserver: request timed out

eparis commented 9 years ago

might be right to ask what those mean over in the http://github.com/coreos/etcd project. I've never seen them...

gouyang commented 9 years ago

@PeterLamar I guess that the etcd already initialized as member before, if so, please run sudo rm -fr /var/lib/etcd/default.etcd and restart your etcd service, I think it can solve your problem.

vnugent commented 9 years ago

OP here. I'm on OpenStack as well. @gouyang rm -fr /var/lib/etcd/default.etcd didn't help

# curl http://172.18.17.3:4001/version
etcd 2.0.9