firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
26.32k stars 1.83k forks source link

[Snaps] Reduce resume time from snapshot for microVMs preventively warming up the cache #2944

Closed xmarcalx closed 10 months ago

xmarcalx commented 2 years ago

Feature Request

This feature will allow to reduce the resume time of a microVM when initialized from a snapshot. This issue will collect all the activities required to implement, develop, test and document this feature.

Describe the desired solution

With this feature we would like to implement a mechanism which will allow to warming up the cache of a microVM when resuming from a snapshot in order to avoid an initial page-fault storm happening as soon as the workload starts to execute. Probably such mechanism will be delegated to an external process to which is delegated the logic to implement the strategy to warm up the cache.

Describe possible alternatives

TBD

Additional context

We still need to investigate possible solutions, new ideas as usual are always welcome.

Checks

sandreim commented 2 years ago

Hi @xmarcalx,

IIRC there are two ways to do this from a separate(or not) process:

Both of them burn CPU/memory in bulk ahead of time. With all other things being equal (especially work being performed by the guest in a timeframe), the CPU cost can be lower than the aggregate cost of all pagefaults, compared to running the workload from a fresh boot.

To enable such low costs in both CPU and memory, you would need to know which pages to fault, such that a reasonable percentage of hot pages are present and the perceived workload initial latency stays within SLA. This can be solved by implementing a tool that collects metadata about memory access patterns and bundles them with the snapshots. This is something documented and implemented here - https://github.com/ease-lab/vhive, but you can also check this out: https://www.phoronix.com/scan.php?page=news_item&px=DAMON-Reclaim-v3 . It is worth taking a look at as it just might need tweaking to implement this usecase.

Now, If you ask me, I would bet on the first option, as it provides accuracy and more control. However, the second is faster to implement as there are no changes needed in Firecracker, but I am concerned by the lack of control on how the kernel evicts pages from the page cache, especially under memory pressure.

xmarcalx commented 2 years ago

Hi @sandreim ,

Thanks a lot for the message and the useful inputs :)

Yes UFFD is definitely something useful in this area and that is why we are progressing on its implementation and we already are working on it in #2938. Probably we will make use of UFFD and this issue will be a follow up of the previous issue but as you mentioned we need to do more investigation work to understand the best way to move forward.

xmarcalx commented 10 months ago

Closing this issue as the support for UFFD handler was able to cover the usecase. Whenever there is a page fault the UFFD handler will be called and being a custom implementation, a Firecracker user can decide which and how many pages to fault in, fitting their workloads and performance requirements.