Option to unload the component from memory when scaling to zero

petersalomonsen commented 3 months ago

Is your feature request related to a problem? Please describe. I have a 28 MB WebAssembly component file, and I notice when starting the spin app that it occupies 100MB before any request is made. A smaller Wasm file takes up less memory when idle.

Describe the solution you'd like Would it be possible to have an option of not preloading the wasm files into memory? And also that it unloads the wasm binary from memory after the request is handled.

In the case I am deploying one spin app per customer environment, I would like to avoid that they occupy this much memory when idle. The memory footprint is the most significant impact on how many nodes that are needed, and so being able to reduce memory consumption to near zero when idle is essential for really scaling to zero.

tschneidereit commented 3 months ago

Can you share more details on how you're measuring the memory usage? AFAICT you shouldn't see that kind of increase in real memory usage: the file should be lazily mapped, with only pages that are actually read from being loaded individually. If that's not the case, it certainly is a bug we should fix.

lann commented 3 months ago

After some further investigation it appears that Spin's current usage of Wasmtime does not lead to a lazily mapped file, so the memory usage you are seeing is expected.

It would be possible to use Wasmtime in a slightly different way that would lead to the behavior @tschneidereit describes, but it would have performance implications so may require further investigation/consideration.

petersalomonsen commented 3 months ago

Here's the big Wasm file, before any request is made:

In Kubernetes, deployed with SpinKube, I use kubectl top pods and get around 150Mi in memory usage for this pod.

A smaller module has a much lower idle memory usage:

I would wish that only the http listener should occupy memory, and the wasm modules should only be loaded into memory while a request is processed.

petersalomonsen commented 3 months ago

After some further investigation it appears that Spin's current usage of Wasmtime does not lead to a lazily mapped file, so the memory usage you are seeing is expected.

It would be possible to use Wasmtime in a slightly different way that would lead to the behavior @tschneidereit describes, but it would have performance implications so may require further investigation/consideration.

It would be good to have this as an option. Even though it will mean some extra loading time for the Wasm, the alternative is to scale down the pod, for example using the KEDA http addon with HTTPScaledObject, and then the cold start time is even longer.

So the current solution with the Wasm binary preloaded is good, and probably the one to recommend for smaller Wasm binaries, but for larger binaries it would be good to have the option to not have them preloaded in memory.

lann commented 3 months ago

pod

FYI, I believe https://www.spinkube.dev/ does have the file mapping behavior described above, though I haven't personally verified that behavior.

alexcrichton commented 3 months ago

it would have performance implications so may require further investigation/consideration.

I think it'd be reasonable for Spin to, by default, compile a component, write it to disk, and then remap it from disk. That shouldn't have much perf implications for startup since writing/mapping should be pretty speedy.

Having an opt-in boolean for "madvise(DONTNEED) the whole image away after each request" might be a way to solve this use case without impacting perf by default then perhaps.

lann commented 3 months ago

I suppose there could also be an option to MLOCK_ONFAULT too, for the ~opposite effect.

mikkelhegn commented 3 months ago

@petersalomonsen - The Fermyon Platform for Kubernetes (A paid product), has been designed to optimize for hosting multiple apps behind a single host, and applying the behavior you're looking for. Feel free to reach out here if you'd be interested in a follow-up conversation about that.

petersalomonsen commented 3 months ago

@mikkelhegn What is the difference between the Fermyon Platform for Kubernetes and SpinKube?

My current use case is that I'm writing a book about WebAssembly, where one of the chapters is about WebAssembly on Kubernetes. One of the points I'm trying to make is the possibility of scaling to zero, and while it is possible to scale to zero pods using KEDA with the http-addon, which is also quite fast when it comes to startup time, it is beneficial to have the http listener running to get the instant responding user experience. I see that the runtime provided with spinkube does this already, and creates a Wasm instance on the fly, but each SpinApp will still have a memory footprint related to the size of the Wasm module. Is it so that Fermyon Platform for Kubernetes does not keep the Wasm module in memory? And this is a feature that is intentionally not in the open source version?

mikkelhegn commented 3 months ago

I'm writing a book about WebAssembly

That sounds awesome!

What is the difference between the Fermyon Platform for Kubernetes and SpinKube?

There are more changes between the two, and the blog I added above is a good source for some of that information, but for this particular case we don't run the individual Spin application (one or more components) in a pod. We have to "break out" of the K8s model in order to achieve this "scaling" behavior. But it gives us the opportunity to host 5,000 apps on a single node, becasue the overhead of a deployed app, is bascially disk space, and nothign else. There is a single host process per node, which is responsible for running the listener. You can think of this as taking the spin.toml of 5,000 apps, and munch them all together in one big spin.toml file (adding host header infor for mapping trigger routes), and then run a single spin up - this is NOT how it acutally works, but I hope you get the idea. It's the same model we use in Fermyon Cloud, and we can do this safely becasue of the isolation model Wasm gives us. The really cool thing about this is that all your apps are both scaled up to be able to use the full node, but also scaled back to do nothing. Meaning that you wouldn't need to apply scaling rules to the individual app (like in the SpinKube model). You simply just end up putting the components on disk on all the nodes you want to have them on, and they are always ready to serve events.

mikkelhegn commented 3 months ago

Sorry I skipped the last questions in there

Is it so that Fermyon Platform for Kubernetes does not keep the Wasm module in memory?

There are optimizations in the executor we're using in the platform, which can be configure to keep a certain amount of modules in memory. But also optimizations which ensures the modules are compiled AOT, as they gets deployed.

And this is a feature that is intentionally not in the open source version?

Yes, we believe the value of a high-density platform like this is a licenseable product.

Full disclosure - I'm the Head of Product at Fermyon

petersalomonsen commented 3 months ago

Thanks @mikkelhegn . It all makes sense now. I understand why this is a licensable product. I will write then that WebAssembly makes it possible to achieve this kind of density, but solutions that leverage this potential fully are licensed or you can achieve something similar through bundling all environments in one package ( like you suggest with the spin.toml of 5000 apps ).

Thanks for sharing this, and also for the awesome product that spin is. I believe for sure that this is a key ingredient in the next generation of cloud native technology.

mikkelhegn commented 3 months ago

achieve something similar through bundling all environments in one package ( like you suggest with the spin.toml of 5000 apps )

Just want to make sure that it's clear this was just a way to describe what is happening conceptually. I wouldn't recommend anyone trying to do that. I also think there is an upper limit for the number of components you can add to a Spin application (64 maybe?).

fermyon / feedback

Option to unload the component from memory when scaling to zero #50