Open pms1969 opened 6 years ago
Adding some additional information:
Thinking that it may have been the size of the etcd nodes, and thinking I'd need a bit more room ont eh workers, I changed the config so that the etcd nodes wher t2.medium, and the workers were t2.2xlarge. I then reapplied. didn't seem to work, so I terminated the master node. Now the master is completely fubar'ed.
It's stuck running 2 containers over and over:
ore@ip-10-102-6-98 ~ $ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NA$
ES
84e0ab2a0e52 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/bin/bash -c '\n ..." 6 seconds ago Exited (0) 5 seconds ago pra
ctical_davinci
f04141809f33 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/detect-master.sh" 7 seconds ago Exited (0) 5 seconds ago xen
odochial_euler
8624c670a694 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/bin/bash -c '\n ..." 18 seconds ago Exited (0) 17 seconds ago nau
ghty_swartz
ae231ea3c901 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/detect-master.sh" 20 seconds ago Exited (0) 18 seconds ago vig
ilant_minsky
e51e885ab635 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/bin/bash -c '\n ..." 31 seconds ago Exited (0) 29 seconds ago sle
epy_brattain
b0322b040e24 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/detect-master.sh" 32 seconds ago Exited (0) 30 seconds ago elo
quent_pasteur
700b08b69b90 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/bin/bash -c '\n ..." 43 seconds ago Exited (0) 42 seconds ago hun
gry_franklin
0fb019b4eed1 quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/detect-master.sh" 44 seconds ago Exited (0) 42 seconds ago bli
ssful_lalande
d584d41f958c quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600 "/bin/bash -c '\n ..." 55 seconds ago Exited (0) 54 seconds ago ama
zing_engelbart
there's no output from detect-master, but it is returning true
log from the other is
core@ip-10-102-6-98 ~ $ docker logs e51e885ab635
download: s3://mys3bucket/assets.zip to tmp/mys3bucket+assets.zip
So it's downloading the assets, just whatever script is running, isn't getting that message.
UPDATE:
The last problem was due to the assets.zip file being corrupted. Not sure how that happened.
I've since recreated the cluster, and the same initial problem persists. hyperkube (running flock) exits with
F0209 12:09:50.976234 5 hooks.go:133] PostStartHook "ca-registration" failed: unable to initialize client CA configmap: timed out waiting for the condition
the other hyperkube continues to error with repeatedly with:
E0209 12:13:42.552673 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://mydomain.com:443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: EOF
Apologies for the running commentary.
Also having this exact issue. Is there any additional information on this?
I'm trying to set up a POC cluster for a bit of experimentation. Chose the vanilla method (see config at bottom);
At first, I had the same issue as #6. Running plan and apply again sorted it, but now the running
kubectl cluster-info
gives me:running with
dump
just returnsssh'ing to the machine, hyperkube seemed to have restarted a few times. when it became stable, the logs were full of:
docker ps -a
gives:Getting the logs of the 2b204 container:
my config looks like this:
Any idea what might have gone wrong? Any help appreciated.
NB: logs sanitised.