coreos / tectonic-installer

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more
Apache License 2.0
601 stars 266 forks source link

Downloads of assets during bootkube doesn't restart properly #752

Open xgerman opened 7 years ago

xgerman commented 7 years ago

If the Internet is choppy during a download the system will just hang and not resume properly: core@controller1 ~ $ sudo systemctl stop bootkube core@controller1 ~ $ journalctl -u bootkube -f -- Logs begin at Thu 2017-05-18 14:10:26 UTC. -- May 18 14:10:52 controller1.dev-env.local bash[1413]: Downloading ACI: 68.2 KB/18.1 MB May 18 14:10:53 controller1.dev-env.local bash[1413]: Downloading ACI: 138 KB/18.1 MB May 18 14:10:54 controller1.dev-env.local bash[1413]: Downloading ACI: 207 KB/18.1 MB May 18 14:11:12 controller1.dev-env.local bash[1413]: Downloading ACI: 242 KB/18.1 MB May 18 14:11:13 controller1.dev-env.local bash[1413]: Downloading ACI: 277 KB/18.1 MB May 18 14:11:15 controller1.dev-env.local bash[1413]: Downloading ACI: 364 KB/18.1 MB May 18 14:50:13 controller1.dev-env.local systemd[1]: bootkube.service: Main process exited, code=killed, status=15/TERM May 18 14:50:13 controller1.dev-env.local systemd[1]: Stopped Bootstrap a Kubernetes cluster. May 18 14:50:13 controller1.dev-env.local systemd[1]: bootkube.service: Unit entered failed state. May 18 14:50:13 controller1.dev-env.local systemd[1]: bootkube.service: Failed with result 'signal'. ^C

Expected behavior would be not hanging and trying to resume downloads.

s-urbaniak commented 7 years ago

This download happens using rkt. As far as I know resumable downloads are not supported in the docker2aci library, /cc'ing @lucab to ensure/verify and also to brainstorm if this something we should tackle in rkt or in the calling systemd service unit.

lucab commented 7 years ago

I'm lacking some details here, so just some quick observations:

alexsomesan commented 7 years ago

Earlier pre-pull sounds easy to achieve.

lucab commented 7 years ago

Late followup on this: a better behavior here would be to having a fail-restarting bootkube unit. However that unit is a oneshot service which doesn't support restarts. Changing this to a type simple service would work, but there are further issues about the bootkube process itself not being restartable.