att-comdev / halcyon-vagrant-kubernetes

Vagrant deployment mechanism for halcyon-kubernetes.
Apache License 2.0
12 stars 14 forks source link

Failing on bootstrap check #48

Closed sercanacar closed 7 years ago

sercanacar commented 7 years ago

hi,

Thank you for the repo, however deployment using libvirt as provider is failing on:

TASK [kube-init : check for bootstrap] ***** fatal: [kube1]: FAILED! => {"changed": true, "cmd": "kubectl get nodes", "delta": "0:00:00.084344", "end": "2017-01-20 12:54:40.970560", "failed": true, "rc": 1, "start": "2017-01-20 12:54:40.886216", "stderr": "The connection to the server localhost:8080 was refused - did you specify the right host or port?", "stdout": "", "stdout_lines": [], "warnings": []} ...ignoring

config.rb looks like this $proxy_enable = true $proxy_no = "localhost,127.0.0.1,172.16.35.11,172.16.35.12,172.16.35.13,172.16.35.14"

What have I missed?

Regards

v1k0d3n commented 7 years ago

@sercanacar sorry for the late response! you require proxy because you're behind a firewall? it appears that you're proxying for your vagrant hosts, and by default that is not required. proxy handles this for you when you enable.

v1k0d3n commented 7 years ago

@sercanacar any updates on this, so we can either resolve them for you or close the issue?

vhosakot commented 7 years ago

I see this error too.

TASK [kube-init : check for bootstrap] *****************************************
fatal: [kube1]: FAILED! => {"changed": true, "cmd": "kubectl get nodes", "delta": "0:00:00.255915", "end": "2017-02-11 06:30:51.426674", "failed": true, "rc": 1, "start": "2017-02-11 06:30:51.170759", "stderr": "The connection to the server localhost:8080 was refused - did you specify the right host or port?", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

But, when I SSH into kube1, I can run kubectl get nodes successfully.

# vagrant ssh kube1
[vagrant@kube1 ~]$ kubectl get nodes
NAME           STATUS         AGE
172.16.35.11   Ready,master   6m
172.16.35.12   Ready          4m
172.16.35.13   Ready          4m
172.16.35.14   Ready          4m

Should https://github.com/att-comdev/halcyon-kubernetes/blob/master/kube-deploy/roles/kube-init/tasks/ubuntu-masters.yml#L18 wait until the master (kube1) is fully up?

I also see these errors:

TASK [kube-prep : forcing all k8s containers to be recreated with correct dns settings] ***
fatal: [kube4]: FAILED! => {"changed": true, "cmd": "docker ps | awk '$NF ~ /^k8s_/ { print $1}' | xargs -l1 docker rm -f", "delta": "0:00:01.419671", "end": "2017-02-11 06:36:11.164915", "failed": true, "rc": 123, "start": "2017-02-11 06:36:09.745244", "stderr": "\"docker rm\" requires at least 1 argument(s).\nSee 'docker rm --help'.\n\nUsage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]\n\nRemove one or more containers", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring
fatal: [kube2]: FAILED! => {"changed": true, "cmd": "docker ps | awk '$NF ~ /^k8s_/ { print $1}' | xargs -l1 docker rm -f", "delta": "0:00:01.157304", "end": "2017-02-11 06:36:12.779749", "failed": true, "rc": 123, "start": "2017-02-11 06:36:11.622445", "stderr": "\"docker rm\" requires at least 1 argument(s).\nSee 'docker rm --help'.\n\nUsage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]\n\nRemove one or more containers", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring
fatal: [kube3]: FAILED! => {"changed": true, "cmd": "docker ps | awk '$NF ~ /^k8s_/ { print $1}' | xargs -l1 docker rm -f", "delta": "0:00:01.207939", "end": "2017-02-11 06:36:13.106148", "failed": true, "rc": 123, "start": "2017-02-11 06:36:11.898209", "stderr": "\"docker rm\" requires at least 1 argument(s).\nSee 'docker rm --help'.\n\nUsage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]\n\nRemove one or more containers", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

When I SSH into kube2, kube3, kube4, and run docker ps, I see the error Cannot connect to the Docker daemon. Is the docker daemon running on this host?. Is it expected?

# vagrant ssh kube2
[vagrant@kube2 ~]$ docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

# vagrant ssh kube3
[vagrant@kube3 ~]$ docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

# vagrant ssh kube4
[vagrant@kube4 ~]$ docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

But, when I use sudo, I can run docker ps successfully in kube{2,3,4}.

# vagrant ssh kube2
[vagrant@kube2 ~]$ sudo docker ps | awk '$NF ~ /^k8s_/ { print $1}'
22ea3dd41b44
d713220f448c
8e8d82bf33ea
37900a54ff94
dbd09eca8f50

# vagrant ssh kube3
[vagrant@kube3 ~]$ sudo docker ps | awk '$NF ~ /^k8s_/ { print $1}'
045de20f84a7
eeff0bf9a064
016013546c45
a9e4e022b17c

# vagrant ssh kube4
[vagrant@kube4 ~]$ sudo docker ps | awk '$NF ~ /^k8s_/ { print $1}'
e8fbf2972aa9
c312e031c20a
b5d62523dbbd
e355de2383ff
eab1ab62ebf8
c4a1e2c9cb25
3fce89550f5a

Should https://github.com/att-comdev/halcyon-kubernetes/blob/master/kube-deploy/roles/kube-prep/tasks/prep-host-dns.yml#L62 use sudo or Ansible's become (http://docs.ansible.com/ansible/become.html#directives) for privilege escalation?

Are these errors harmless? I will dig more and will contact you and/or @intlabs (portdirect) in the #openstack-kolla IRC channel if I need more help. My IRC nickname is vhosakot.

sercanacar commented 7 years ago

@v1k0d3n you can close this thread if you wish. I'm not experiencing any more issues

vhosakot commented 7 years ago

@sercanacar how did you resolve this issue? I still see the errors.

v1k0d3n commented 7 years ago

nice @sercanacar. can you share what resolved the issue for @vhosakot? perhaps if there are any known issues, we can either work around them or document them a bit better for our users.

sercanacar commented 7 years ago

Hi,

I didn't actually resolve it. My setup is behind proxy environment. I continued with the setup guide as per normal.

I understand this isn't a solution

Regards, Sercan

v1k0d3n commented 7 years ago

So the proxy configuration is a performed as a Vagrant plugin; meaning that you do not need to add each of the hosts manually. You should be able to just add "localhost,127.0.0.1" as shown in the example. As a most basic example, you should really just have to edit $proxy_enable= true and point $proxy_http/s to your proxy URL. Vagrant proxy is really nice, but has a bit of magic in which each of the hosts will proxy/redirect through a single source. For instance, you can configure proxyconf to proxy apt, docker, yum, etc.

You may need some special considerations depending on what your proxy requires. This takes care of a vanilla case where normal internet traffic will fail unless it goes through a corp proxy (like we had in our internal case). If there are special needs you have, perhaps we can get these added to improve the project.

Here's the project we're using (although you're probably already aware): https://github.com/tmatilai/vagrant-proxyconf

Configuring the 172.16.35.x,172.16.35.y addresses will only cause you issues. Remove them.

@aric49 do you have anything to add, or do you see opportunity to improve this feature in our case?

vhosakot commented 7 years ago

Thanks for the replies!

@v1k0d3n Firstly, thanks for this fantastic repo! 😄 I'm a big fan of this repo and https://github.com/att-comdev/halcyon-kubernetes. Just wanted to let you know that the entire kolla-kubernetes community is using this repo to setup the dev environment (these repos are used in http://docs.openstack.org/developer/kolla-kubernetes/development-environment.html). I will watch this repo and halcyon-kubernetes closely and try to fix/contribute as much as I can 😄. I'll use these repos to deploy multi-node k8s and play with k8s, and also to install kolla-kubernetes on it (OpenStack on k8s). Great work really!

As for the errors in this bug, I see both are harmless and expected, and Ansible shows them as failures which might confuse the users. I've submitted the PR https://github.com/att-comdev/halcyon-kubernetes/pull/52 to fix it. The test results are in https://gist.github.com/vhosakot/65095b5904fc07ff1756aa09152c7bfc.