flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
671 stars 28 forks source link

Reduce provisioning time on Equinix Metal (fka Packet) #125

Open johananl opened 4 years ago

johananl commented 4 years ago

Current situation

When deploying a Flatcar machine on Packet, the provisioning process looks roughly as follows:

Impact

Users have to wait a relatively long time (varies depending on hardware, but usually 5-10 minutes) until they can SSH into the machine. This has an especially big impact on things like auto scaling or self-healing of Kubernetes clusters as these processes rely heavily on fast provisioning of compute nodes.

Ideal future situation

Packet supports 60-second provisioning of selected OS-es (e.g. Centos and Ubuntu). I'd love to see Flatcar among these OS-es.

Besides making Flatcar nicer to use on Packet, 60-second OS-es are clearly marked as "fast" on the Packet console using a visible lightning icon, which could also encourage users to use them.

Implementation options

IMO we can divide the "slowness" into the following factors:

I am not familiar enough with the details but I'm getting the impression this process could be simplified. If we could get rid of one reboot and somehow avoid having to download images over the internet during provisioning, things would look much better. IMO it could even make sense to install an outdated flatcar release to disk, let the machine boot quickly and then let the Flatcar update process update the machine later.

pothos commented 4 years ago

The first step is actually: Configuring PXE and rebooting. Many of these steps can be skipped if instead of configuring PXE there would be a minimal installation container that installs Flatcar to a disk (cf. @invidian's Tinkerbell work). I think this is what the 60 sec provisioning would require but somehow without adding the Ignition config explicitly but rather have it be detected as user data of the instance.

johananl commented 4 years ago

Great. Yeah, I'm not too familiar with the low-level details. I wanted mainly to demonstrate my thoughts and leave the implementation to you folks :-)

pothos commented 3 years ago

@invidian Do you think your work with the Ubuntu or alpine container image can be upstreamed?

pothos commented 3 years ago

One more note: Installing lbzip2 in the container will also help to speed decompression up greatly. This is also missing on the PXE installation method. Last time I measured this was saving ~30 seconds.

invidian commented 3 years ago

@invidian Do you think your work with the Ubuntu or alpine container image can be upstreamed?

Upstreamed where you mean?

The only difference between my process and what @johananl describe is that instead of booting Flatcar initramfs, Tinkerbell boots OSIE image for me, which then runs this Ubuntu container to install Flatcar. I don't think it's any faster.

pothos commented 3 years ago

Upstream to the Equinix Metal tinkerbell repository. This will be faster because booting from iPXE and the additional reboot is slow. As far as I know installing from the container to disk also allow the machine to be pre-provisioned based on popularity count because the write to disk can happen in advance, further reducing the installation time in lucky cases (only setting the user-data and rebooting is needed).

(Edit: My assumption is that the machine can directly start the installation container without rebooting, based on what I observed in the OOB console.)

pothos commented 3 years ago

With an Ubuntu container the lbzip2 package is available for fast decompression but an additional ln -s /usr/bin/lbzip2 /usr/local/bin/bzip2 is needed after the lbzip2 package installation with apt.

jepio commented 2 years ago

The lbzip2 will soon be addressed by https://github.com/kinvolk/coreos-overlay/pull/1221.