firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
24.67k stars 1.73k forks source link

[Feature Request] Hot-plug vCPUs #2609

Open jeromegn opened 3 years ago

jeromegn commented 3 years ago

Feature Request

We'd like to be able to add (and remove) vCPUs from running firecracker microvms. It appears to be possible to do that with KVM. Examples I've seen define how many vCPUs at most a VM might use and then the actual number it will be using at boot. Then you can add vCPUs via virsh.

https://www.unixarena.com/2015/12/linux-kvm-how-to-add-remove-vcpu-to-guest-on-fly.html/

This would allow use to add a "burst" feature when CPU usage spikes.

Describe the desired solution

An API to modify a running microvm's vCPUs count.

This should notify the guest VM of the change:

Dec 16 12:48:28 UA-KVM1 kernel: CPU1 has been hot-added
Dec 16 12:48:28 UA-KVM1 kernel: SMP alternatives: switching to SMP code
Dec 16 12:48:57 UA-KVM1 kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1
Dec 16 12:48:57 UA-KVM1 kernel: kvm-clock: cpu 1, msr 0:3ff87041, secondary cpu clock
Dec 16 12:48:57 UA-KVM1 kernel: TSC synchronization [CPU#0 -> CPU#1]:
Dec 16 12:48:57 UA-KVM1 kernel: Measured 906183720569 cycles TSC warp between CPUs, turning off TSC clock.
Dec 16 12:48:57 UA-KVM1 kernel: tsc: Marking TSC unstable due to check_tsc_sync_source failed
Dec 16 12:48:57 UA-KVM1 kernel: KVM setup async PF for cpu 1
Dec 16 12:48:57 UA-KVM1 kernel: kvm-stealtime: cpu 1, msr 3fd0d240
Dec 16 12:48:57 UA-KVM1 kernel: microcode: CPU1 sig=0x206c1, pf=0x1, revision=0x1
Dec 16 12:48:57 UA-KVM1 kernel: Will online and init hotplugged CPU: 1

Describe possible alternatives

We could give every firecracker microvm access to all cores and only use cgroups to limit actual scheduling time. This is not great though as it might create a lot of CPU steal. We prefer to give full cores when possible.

Checks

KarthikNedunchezhiyan commented 3 years ago

@jeromegn does vertical scaling is the only solution to the problem you are to trying to fix? Does horizontal scaling like multiple instances not helped. Just curious to understand the usecase.

raduiliescu commented 3 years ago

Hi @jeromegn!

You might also want to look at cpu online/offline kernel feature - https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu. On the downside you will always need to start the microVM with the max number of CPUs, and you will need an agent inside the microVM to write the values in /proc, but the overhead of keeping CPUs offline is better than in the cgroup case.

jeromegn commented 3 years ago

@KarthikNedunchezhiyan we need to support a large variety of workloads. More VMs isn't always the solution, but we're already doing that.

@raduiliescu thanks! That could work, but our users have root access to the VM and could bring up any number of CPUs.

AlexandruCihodaru commented 3 years ago

We need to think about it for a bit, we will get back to you.

sudanl0 commented 8 months ago

marking this as parked right now but we will track this as part of our roadmap. We consider this as well while working on this.