Open rushins opened 7 years ago
@rushins thanks for reporting this. Can you provide us with some more information on your cluster so we can understand how to reproduce the issue? How many nodes are in the cluster? Did the cluster ever work? If so, how log until it breaks? How can we, as minimally as possible, recreate the issue? This point is very important since without it it is very difficult to gather any clues as to why your configuration is not working. Finally, can you please post logs for the failing etcd-member service? Thank you!
@squat just experiencing the same thing on a fresh install albeit I'm actually using the terraform bootkube installer from another repo ( https://github.com/coreos/matchbox/tree/master/examples/terraform/bootkube-install ).
Seems in my case the /etc/ssl/etcd
config is not created properly, still looking into why this is.
I do need to mention I had a working installation before on 1465.7.0
, did a fresh pxe to 1465.8.0
, encountered this same exact issue and just did a fresh install back to 1465.7.0
, problem persists.
-- Logs begin at Mon 2017-10-09 17:39:55 UTC, end at Mon 2017-10-09 17:44:17 UTC. --
Oct 09 17:40:13 node01.mydomain.example systemd[1]: Starting etcd (System Application Container)...
Oct 09 17:40:17 node01.mydomain.example rkt[774]: rm: unable to resolve UUID from file: open /var/lib/coreos/etcd-member-wrapper.uuid: no such file or directory
Oct 09 17:40:17 node01.mydomain.example rkt[774]: rm: failed to remove one or more pods
Oct 09 17:40:17 node01.mydomain.example etcd-wrapper[841]: ++ id -u etcd
Oct 09 17:40:17 node01.mydomain.example etcd-wrapper[841]: + exec /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/etcd-member-wrapper.uuid --trust-keys-from-https --mount volume=coreos-systemd-dir,target=/run/systemd/system --volume coreos-
Oct 09 17:40:27 node01.mydomain.example etcd-wrapper[841]: pubkey: prefix: "quay.io/coreos/etcd"
Oct 09 17:40:27 node01.mydomain.example etcd-wrapper[841]: key: "https://quay.io/aci-signing-key"
Oct 09 17:40:27 node01.mydomain.example etcd-wrapper[841]: gpg key fingerprint is: BFF3 13CD AA56 0B16 A898 7B8F 72AB F5F6 799D 33BC
Oct 09 17:40:27 node01.mydomain.example etcd-wrapper[841]: Quay.io ACI Converter (ACI conversion signing key) <support@quay.io>
Oct 09 17:40:27 node01.mydomain.example etcd-wrapper[841]: Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/coreos/etcd" without fingerprint review.
Oct 09 17:40:27 node01.mydomain.example etcd-wrapper[841]: Added key for prefix "quay.io/coreos/etcd" at "/etc/rkt/trustedkeys/prefix.d/quay.io/coreos/etcd/xxx"
Oct 09 17:40:31 node01.mydomain.example etcd-wrapper[841]: Downloading signature: 0 B/473 B
Oct 09 17:40:31 node01.mydomain.example etcd-wrapper[841]: Downloading signature: 473 B/473 B
Oct 09 17:40:35 node01.mydomain.example etcd-wrapper[841]: Downloading ACI: 0 B/13.3 MB
Oct 09 17:40:35 node01.mydomain.example etcd-wrapper[841]: Downloading ACI: 10.1 KB/13.3 MB
Oct 09 17:40:36 node01.mydomain.example etcd-wrapper[841]: Downloading ACI: 687 KB/13.3 MB
Oct 09 17:40:37 node01.mydomain.example etcd-wrapper[841]: Downloading ACI: 3.56 MB/13.3 MB
Oct 09 17:40:38 node01.mydomain.example etcd-wrapper[841]: Downloading ACI: 12.5 MB/13.3 MB
Oct 09 17:40:38 node01.mydomain.example etcd-wrapper[841]: Downloading ACI: 13.3 MB/13.3 MB
Oct 09 17:40:38 node01.mydomain.example etcd-wrapper[841]: image: signature verified:
Oct 09 17:40:38 node01.mydomain.example etcd-wrapper[841]: Quay.io ACI Converter (ACI conversion signing key) <support@quay.io>
Oct 09 17:40:42 node01.mydomain.example etcd-wrapper[841]: run: stat of host path /etc/ssl/etcd: stat /etc/ssl/etcd: no such file or directory
Oct 09 17:40:42 node01.mydomain.example systemd[1]: etcd-member.service: Main process exited, code=exited, status=254/n/a
Oct 09 17:40:42 node01.mydomain.example systemd[1]: Failed to start etcd (System Application Container).
Oct 09 17:40:42 node01.mydomain.example systemd[1]: etcd-member.service: Unit entered failed state.
Oct 09 17:40:42 node01.mydomain.example systemd[1]: etcd-member.service: Failed with result 'exit-code'.
Oct 09 17:40:52 node01.mydomain.example systemd[1]: etcd-member.service: Service hold-off time over, scheduling restart.
Oct 09 17:40:52 node01.mydomain.example systemd[1]: Stopped etcd (System Application Container).
Oct 09 17:40:52 node01.mydomain.example systemd[1]: Starting etcd (System Application Container)...
Oct 09 17:40:52 node01.mydomain.example rkt[969]: "6298a1ce-e0ad-46bf-b21d-4c1df1252960"
Oct 09 17:40:52 node01.mydomain.example etcd-wrapper[983]: ++ id -u etcd
Oct 09 17:40:52 node01.mydomain.example etcd-wrapper[983]: + exec /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/etcd-member-wrapper.uuid --trust-keys-from-https --mount volume=coreos-systemd-dir,target=/run/systemd/system --volume coreos-
Oct 09 17:40:54 node01.mydomain.example etcd-wrapper[983]: run: stat of host path /etc/ssl/etcd: stat /etc/ssl/etcd: no such file or directory
Oct 09 17:40:54 node01.mydomain.example systemd[1]: etcd-member.service: Main process exited, code=exited, status=254/n/a
Oct 09 17:40:54 node01.mydomain.example systemd[1]: Failed to start etcd (System Application Container).
Oct 09 17:40:54 node01.mydomain.example systemd[1]: etcd-member.service: Unit entered failed state.
Oct 09 17:40:54 node01.mydomain.example systemd[1]: etcd-member.service: Failed with result 'exit-code'.
Oct 09 17:41:04 node01.mydomain.example systemd[1]: etcd-member.service: Service hold-off time over, scheduling restart.
Oct 09 17:41:04 node01.mydomain.example systemd[1]: Stopped etcd (System Application Container).
Oct 09 17:41:04 node01.mydomain.example systemd[1]: Starting etcd (System Application Container)...
Oct 09 17:41:04 node01.mydomain.example rkt[1035]: "3a983cdc-2a27-4ad6-acb7-cb3550291736"
Oct 09 17:41:04 node01.mydomain.example etcd-wrapper[1048]: ++ id -u etcd
Oct 09 17:41:04 node01.mydomain.example etcd-wrapper[1048]: + exec /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/etcd-member-wrapper.uuid --trust-keys-from-https --mount volume=coreos-systemd-dir,target=/run/systemd/system --volume coreos
Oct 09 17:41:06 node01.mydomain.example etcd-wrapper[1048]: run: stat of host path /etc/ssl/etcd: stat /etc/ssl/etcd: no such file or directory
Oct 09 17:41:06 node01.mydomain.example systemd[1]: etcd-member.service: Main process exited, code=exited, status=254/n/a
Oct 09 17:41:06 node01.mydomain.example systemd[1]: Failed to start etcd (System Application Container).
Oct 09 17:41:06 node01.mydomain.example systemd[1]: etcd-member.service: Unit entered failed state.
Oct 09 17:41:06 node01.mydomain.example systemd[1]: etcd-member.service: Failed with result 'exit-code'.
Oct 09 17:41:16 node01.mydomain.example systemd[1]: etcd-member.service: Service hold-off time over, scheduling restart.
Oct 09 17:41:16 node01.mydomain.example systemd[1]: Stopped etcd (System Application Container).
Oct 09 17:41:16 node01.mydomain.example systemd[1]: Starting etcd (System Application Container)...
Oct 09 17:41:16 node01.mydomain.example rkt[1068]: "473d9429-6d1f-425c-ac4f-7389d6a9f860"
FYI
m@evy ~ $ terraform version
Terraform v0.10.7
It's a on-premise bare metal lab, not directly related to your repo but it might be the same issue :)
I have the same issue with pxe and metal (vmware fusion), matchbox works fine, got one master and one worker. Seems like something is wrong with copying etcd tls stuff to /etc/ssl/etcd - that folder does not exist, so it seems to be something in remote.tt is not working. Similar thing is wrong with /etc/kubernetes/kubeconfig, because worker logs indicate that file is missing. I do not think it is related matchbox/dnsmasq setup or ssh-add -L, but I do not know because I used the tectonic ui installer, not terraform directly.
I had terraform 0.10.8 installed via brew - after I removed that things worked! :)
What keywords did you search in tectonic-installer issues before filing this one?
etcd If you have found any duplicates, you should instead reply there and close this page.
If you have not found any duplicates, delete this section and continue on.
Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT or FEATURE REQUEST
Versions
1.7.3
terraform version
):What happened?
ETCD member service stops on controller and ETCD container existing
What you expected to happen?
ETCD member service where ETCD in RKT container is exiting.
How to reproduce it (as minimally and precisely as possible)?
tried this methods of GITHUB link but no luck.https://github.com/coreos/etcd/blob/master/Documentation/platforms/container-linux-systemd.md#etcd-3x-service
Anything else we need to know?
etcd-member.service starting issue reported the same on GITHUB etcd-member.service starting issue v 3.1.0 #8596 and gyuho asked me to open a new bug.