firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.09k stars 1.76k forks source link

Increase NUMA selection options #1617

Closed christian7007 closed 4 years ago

christian7007 commented 4 years ago

Jailer requires a NUMA node for deploying a microVM, this NUMA node is used to pin the new microVM to the cores (all of them) and memory of node by writing cpuset.cpus and cpuset.mems cgroup files.

It would be useful to be able to not only isolate the process in a NUMA node but being able to isolate it in a specific CPU (or CPU set). This is needed to mitigate recent micro-architecture CPU vulnerabilities (Spectre, Meltdown and friends). The idea is not to share threads of the same core across micro-VMs.

As mentioned before, jailer not only pins the process to the node CPUs it also pins the memory to the same NUMA node. It would be useful also to select the memory node independently. Note that some architectures allows NUMA nodes without memory associated (https://github.com/OpenNebula/one/issues/3256#issuecomment-491702104) and jailer will not run in such machines.

If it's interesting for the project I can prepare an RFC PR for this.

KarthikNedunchezhiyan commented 4 years ago

@christian7007 It is possible to override cgroup (which was created by jailer) before booting firecracker right?

christian7007 commented 4 years ago

In our use case, we start the microVMs with --config-file option, so as soon as we start the jailer, it starts the firecracker process and the VM starts automatically.

We've done some tests and if we overwrite cpuset.cpus content it seems that the process is migrated to the new CPU(s). This does not works for memory (cpuset.mems).

Also, we don't like the idea of starting the VM in a CPU and later switching it to another one. The most required use case for isolating a process in a single core use to be increasing security, as explained in the description, and this kind of break it.

We think this can be useful to more people and it would be nice to have this options available at the jailer.

sandreim commented 4 years ago

Hi @christian7007 ,

The production setup recommendation docs provide a couple of ways to mitigate micro-architecture CPU vulnerabilities at the host level. These should provide the protection you are looking for. Are there any other reasons you want to pin the VMs to specific CPUs on top of that ?

Also, we need some time to dive deep on cpu pinning and it's implications. We will get back to you next week.

christian7007 commented 4 years ago

Hello @sandreim,

Let me give you some info about our use case. I'm working on integrating Firecracker with OpenNebula. The CPU pinning feature was a really useful feature for our users and we would like to be able to have it totally integrated with Firecracker.

Apart from security reasons we think that adding this as an optional configuration will add more flexibility for managing the available resources at different scenarios. For example, it will allow to reduce the noisy neighbor effect although this could, to some extent, go against the resource sharing approach adopted by Firecracker it can be useful in some specific cases. Also it will allows to select the memory node for NUMA architectures using nodes without memory.

Thanks for the interest, I'll be waiting to know your thoughts about it.

dianpopa commented 4 years ago

Hi @christian7007 .

We are currently trying to reach a decision about this issue with the team. There are 2 problems that we need to figure out: users of the --config-file feature will not be given the chance to tweak any cgroup associated files and we still need to decide if still want to offer this as a feature or we treat this as a bug.

Your main use case is to be able to assign CPUs but without enforcing memory allocation which numa node is implicitly doing (correct me if I am wrong). I understand you tried to solve this by overriding cpuset.cpu and cpuset.mems and it did not work. Do you know why it did not work?

Looking forward to your answer.

christian7007 commented 4 years ago

Hello @dianpopa,

Yes, that's one of our use cases, our idea is to be able to select different NUMA nodes for CPU and memory but also we'd like to be able to pin a MicroVM to a specific core (not to the entire NUMA node) if needed. Our idea is to add more flexibility to jailer to accommodate these use cases while preserving its current NUMA behavior.

I did try to override the cgroups and it works for CPU. And for memory if cpuset.memory_migrate is enabled. However memory_migrate produces unnecessary trashing of memory pages with the associated cross noise for the other micro-vms. Moreover, for machines with no-memory nodes we could not test it directly in a system with a NUMA node without memory, but it most probably fail the allocation.

acatangiu commented 4 years ago

@christian7007 we're hosting a Community Office Hours meeting on Thursday, more details on our slack channel.

It would be great if you could join that so we discuss this issue live and find a suitable solution. If you can't attend the office hours meeting let us know and maybe we can set up another dedicated call.

christian7007 commented 4 years ago

Hello @acatangiu, sure I'll be there.

Thanks for the interest!