att-comdev / promenade

This project has moved to OpenStack.
https://www.airshipit.org/
Apache License 2.0
11 stars 15 forks source link

K8 Api server fails to start.. #59

Closed pratapaprasanna closed 6 years ago

pratapaprasanna commented 6 years ago

Hi all ,

I have been trying to deploy promenade and i have been facing issues while executing

the genisis.sh script .

The connection to the server 127.0.0.1:6553 was refused - did you specify the right host or port? .^C

I tried debugging using

But everything looks ok the hostname has been changed to 'n0'.

Genisis.Yaml


schema: promenade/Genesis/v1 metadata: schema: metadata/Document/v1 name: genesis layeringDefinition: abstract: false layer: site data: hostname: n0 ip: 192.168.3.119 armada: target_manifest: cluster-bootstrap labels:

IP is : $ ip r default via 192.168.3.1 dev enp3s0 onlink 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 192.168.3.0/24 dev enp3s0 proto kernel scope link src 192.168.3.119

Host name is cat /etc/hostname n0

Any help would of great use. Thanks in advance.

pratapaprasanna commented 6 years ago

Links which might be useful : http://paste.openstack.org/show/718020/ k8-api server logs http://paste.openstack.org/show/718021/ --- etcd container logs ...

pratapaprasanna commented 6 years ago

http://paste.openstack.org/show/718022/ --- k8-api better logs

mark-burnett commented 6 years ago

Hey @pratapaprasanna, thanks for giving this a shot :)

One thing that I suspect is hitting you is that there are actually a number of places in the examples that need to be changed together to ensure consistency of hostnames & ips. E.g. you will need to make sure that PKICatalog.yaml has correct IPs and hostnames so certificates generation has the right data.

Please, post back here and I'm happy to help troubleshoot. One thing that might be helpful as you make progress is the debug-report.sh script, which will help capture logs for troubleshooting.

pratapaprasanna commented 6 years ago

Sure i will try and get back to you thanks @mark-burnett :)

pratapaprasanna commented 6 years ago

Any idea on what all the changes to be done to setup UCP in a single machine ???

Because when I run the genesis.sh script. Everything goes well but calico-node goes to CrashLoopBackOff

root@n0:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                              READY     STATUS             RESTARTS   AGE
kube-system   auxiliary-etcd-n0                                 2/2       Running            0          50m
kube-system   bootstrap-armada-n0                               4/4       Running            0          49m
kube-system   calico-etcd-anchor-9hw47                          1/1       Running            0          47m
kube-system   calico-etcd-n0                                    1/1       Running            0          47m
kube-system   calico-kube-policy-controllers-86f4747b47-gsg6h   1/1       Running            0          47m
kube-system   calico-node-bqrpq                                 1/2       CrashLoopBackOff   13         47m
kube-system   haproxy-n0                                        1/1       Running            0          51m
kube-system   kubernetes-apiserver-n0                           1/1       Running            0          50m
kube-system   kubernetes-controller-manager-n0                  1/1       Running            0          50m
kube-system   kubernetes-etcd-n0                                1/1       Running            0          50m
kube-system   kubernetes-proxy-wktmm                            1/1       Running            0          48m
kube-system   kubernetes-scheduler-n0                           1/1       Running            0          50m

k8s_calico-node_calico-node-bqrpq_kube-system_4ed83113-37fb-11e8-9960-a45d3618640e_13 is the container which is causing the issue.

root@n0:~# docker logs 6475e14f3719
time="2018-04-04T12:12:52Z" level=info msg="Early log level set to info" 
time="2018-04-04T12:12:52Z" level=info msg="NODENAME environment not specified - check HOSTNAME" 
time="2018-04-04T12:12:52Z" level=info msg="Loading config from environment" 
Skipping datastore connection test
time="2018-04-04T12:12:52Z" level=info msg="Building new node resource" Name=n0 
time="2018-04-04T12:12:52Z" level=info msg="Initialise BGP data" 
WARNING: Unable to auto-detect an IPv4 address using interface regexes [ens3]: no valid host interfaces found
ERROR: Couldn't autodetect a management IPv4 address:
  -  provide an IPv4 address by configuring one in the node resource, or
  -  provide an IPv4 address using the IP environment, or
  -  if auto-detecting, use a different autodetection method.
Terminating
Calico node failed to start 

Logs

2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py", line 370, in install_release 2018-04-06 10:23:58.296 8 ERROR armada.cli metadata=self.metadata) 2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 487, in call 2018-04-06 10:23:58.296 8 ERROR armada.cli return _end_unary_response_blocking(state, call, False, deadline) 2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 437, in _end_unary_response_blocking 2018-04-06 10:23:58.296 8 ERROR armada.cli raise _Rendezvous(state, None, None, deadline) 2018-04-06 10:23:58.296 8 ERROR armada.cli grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNKNOWN, release ucp-coredns failed: timed out waiting for the condition)> 2018-04-06 10:23:58.296 8 ERROR armada.cli 2018-04-06 10:23:58.296 8 ERROR armada.cli During handling of the above exception, another exception occurred: 2018-04-06 10:23:58.296 8 ERROR armada.cli 2018-04-06 10:23:58.296 8 ERROR armada.cli Traceback (most recent call last): 2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/cli/init.py", line 40, in safe_invoke 2018-04-06 10:23:58.296 8 ERROR armada.cli self.invoke() 2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/cli/apply.py", line 219, in invoke 2018-04-06 10:23:58.296 8 ERROR armada.cli resp = armada.sync() 2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/armada.py", line 380, in sync 2018-04-06 10:23:58.296 8 ERROR armada.cli timeout=wait_timeout) 2018-04-06 10:23:58.296 8 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py", line 373, in install_release 2018-04-06 10:23:58.296 8 ERROR armada.cli raise ex.ReleaseException(release, status, 'Install') 2018-04-06 10:23:58.296 8 ERROR armada.cli armada.exceptions.tiller_exceptions.ReleaseException: Failed to Install release: ucp-coredns - Tiller Message: b'Release "ucp-coredns" failed: timed out waiting for the condition' 2018-04-06 10:23:58.296 8 ERROR armada.cli

Any idea on how to fix this ????

mark-burnett commented 6 years ago

Hey @pratapaprasanna, sorry for the slow response!

For this particular issue, you probably need to configure the IP_AUTODETECTION_METHOD to match your host. In the examples, we're using it based on interface name (ens3 for the test VMs).

I do want to caution that I don't particularly recommend running this directly on a laptop. It's designed to take control of your machine and turn it into a kubernetes node. I would recommend doing this in a VM.

You may find the basic "resiliency" gate to be a nice place to start, though I recommend trying to run it off this (outstanding) patch set: https://review.gerrithub.io/#/c/406832/

The instructions for trying out that method are here: https://github.com/att-comdev/promenade/blob/master/docs/source/getting-started.rst#running-tests Basically, you just need to run ./tools/setup_gate.sh on your host, log out and back in (to update groups), then run ./tools/gate.sh on the host. That should bring up a cluster with 4 small nodes, suitable for testing, and give you some familiarity with how to build other environments.

pratapaprasanna commented 6 years ago

Great i will try and get back to you incase of any issue :) thanks @mark-burnett

pratapaprasanna commented 6 years ago

Yes currently it worked with the examples mentioned in the basic directory working on the examples under complete directory will mail you in case of any issue i'm closing the issue for now and thanks for all the support and help.

Thanks @mark-burnett :)