Closed ChevronTango closed 1 year ago
Hey @ChevronTango we include the most commonly used workspaces images in the VM image for the node so that they are ready to go and do not need to be downloaded. We also store images that need to be pulled in IPFS so that nodes that do not have that image yet can pull them faster. If you like to know more about that you can watch the presentation that our CTO Chris gave at KubeCon.
Thanks @Furisto. The talk was very interesting to watch and it definitely seems like your team is 4 or 5 steps ahead of where i was thinking. That said, the talk glosses over a few of the key details it would be good to know more about. What have you deployed as your ipfs service? It's not included as part of the installer (though the option to specify its adress is), so it must be a third party helm chart or similar. Would you be able to share more details so that those of us struggling with startup performance can learn from your golden setup?
@ChevronTango please check https://github.com/containerd/nerdctl/tree/main/examples/nerdctl-ipfs-registry-kubernetes We use a similar setup but customized to our environment.
Before using IPFS I suggest following @Furisto advice on customizing the VM image seeding the base image there. That provides the most impact on the startup times in empty/new nodes.
Thanks @aledbf! That's a great resource. Is it the ipfs-cluster
example specifically? Would you be able to give a quick high level summary of what the customizations might be, to save us making any obvious mistakes?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Is your feature request related to a problem? Please describe
Currently a new node is created with little to no caching. Each workspace that is created causes the cache of the node to be updated with the workspace image, and therefore subsequent workspaces are quicker to instantiate, but there can be a delay for those first workspaces to start up on a fresh node, particularly if they are using a large base image.
Describe the behaviour you'd like
As part of either the ws-daemon, or the registry-facade, we should include a service which takes a select few images and forces them to be cached onto the node during startup, and periodically refreshed if missing/garbage-collected or if the image uses the latest tag.
Describe alternatives you've considered
There are services like kube-fledged that do some of this, but they run on a schedule, rather than at node startup. This means that the first workspaces on a node are likely to still be delayed starting up if the cron job from kube-fledged hasn't yet run for that node. Since we already have DaemonSets with access to the underlying containerd runtime it should be possible to augment them with this additional caching functionality to dramatically improve the startup time for the first workspaces on a node.
Additional context