gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
13.02k stars 1.24k forks source link

Periodic Caching of Selected Images on the Workspace Nodes #15400

Closed ChevronTango closed 1 year ago

ChevronTango commented 1 year ago

Is your feature request related to a problem? Please describe

Currently a new node is created with little to no caching. Each workspace that is created causes the cache of the node to be updated with the workspace image, and therefore subsequent workspaces are quicker to instantiate, but there can be a delay for those first workspaces to start up on a fresh node, particularly if they are using a large base image.

Describe the behaviour you'd like

As part of either the ws-daemon, or the registry-facade, we should include a service which takes a select few images and forces them to be cached onto the node during startup, and periodically refreshed if missing/garbage-collected or if the image uses the latest tag.

Describe alternatives you've considered

There are services like kube-fledged that do some of this, but they run on a schedule, rather than at node startup. This means that the first workspaces on a node are likely to still be delayed starting up if the cron job from kube-fledged hasn't yet run for that node. Since we already have DaemonSets with access to the underlying containerd runtime it should be possible to augment them with this additional caching functionality to dramatically improve the startup time for the first workspaces on a node.

Additional context

Furisto commented 1 year ago

Hey @ChevronTango we include the most commonly used workspaces images in the VM image for the node so that they are ready to go and do not need to be downloaded. We also store images that need to be pulled in IPFS so that nodes that do not have that image yet can pull them faster. If you like to know more about that you can watch the presentation that our CTO Chris gave at KubeCon.

ChevronTango commented 1 year ago

Thanks @Furisto. The talk was very interesting to watch and it definitely seems like your team is 4 or 5 steps ahead of where i was thinking. That said, the talk glosses over a few of the key details it would be good to know more about. What have you deployed as your ipfs service? It's not included as part of the installer (though the option to specify its adress is), so it must be a third party helm chart or similar. Would you be able to share more details so that those of us struggling with startup performance can learn from your golden setup?

aledbf commented 1 year ago

@ChevronTango please check https://github.com/containerd/nerdctl/tree/main/examples/nerdctl-ipfs-registry-kubernetes We use a similar setup but customized to our environment.

Before using IPFS I suggest following @Furisto advice on customizing the VM image seeding the base image there. That provides the most impact on the startup times in empty/new nodes.

ChevronTango commented 1 year ago

Thanks @aledbf! That's a great resource. Is it the ipfs-cluster example specifically? Would you be able to give a quick high level summary of what the customizations might be, to save us making any obvious mistakes?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.