shawtao commented 3 years ago

Feature Request

As far as I know, in addition to the PIT timer used during the boot process of the guest kernel, the Firecracker uses kvm-clock as the clocksource and lapic as the clockevent after the kernel is started and does not use the PIT timer. ( I don't know if I'm missing some scenarios where the PIT timer must be used). But Firecracker still provides the Guest with port 0x40~0x43 to create a pit timer, which may bring potential security issues.

The potential security issue

Description

KVM module will create a kvm-pit kernel thread to inject the PIT timer interrupt. When a root user in the Guest creates a periodic pit timer by writing ports 0x40~0x43, it will trigger the periodic injection interrupt of the kvm-pit thread. Once the period set by the user is very short, it will cause the kvm-pit thread and the Firecracker process itself to generate a certain amount of CPU load.

Impact

Although the KVM module has a Variable min_period to limit the PIT timer period, I tried to create a periodic PIT timer with min_period and caused the kvm-pit kernel thread to continuously take up to 6% CPU and the firecracker process to take up to 80% CPU, even though I do nothing in the Guest.

Therefore, malicious root users in the Guest may use the PIT timer to generate some out-of-band workload and affect the performance of the host system.

Environment

Firecracker version.

$firecracker --version 
Firecracker v0.26-wip-103-gea062d09

Host and kernel version
```
$uname -r
5.14.0-amd64-desktop
```
Rootfs used: Not relevant
Architecture: x86_64

Additional

Although this issue has a very limited impact, since there is no longer a need to use PIT timers in Firecracker after the kernel starts, why not just disable the creation of PIT timers.

Describe the desired solution

There is a simple way to forbidden the creation of PIT timers, function create_pit_timer in kvm module

325 struct kvm_kpit_state *ps = &pit->pit_state;
        ...
329 if (!ioapic_in_kernel(kvm) ||
330     ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)
331     return;

In lines 329~331, the PIT timer can't be created if the flag is set to KVM_PIT_FLAGS_HPET_LEGACY. Qemu uses this flag to forbid the creation of PIT timers once it enables HPET emulation.

Although Firecracker does not provide HPET emulation, is it also possible to use ioctl to set this flag at some point after the Guest kernel is booted, thus disabling the creation of the PIT timers.

Describe possible alternatives

Additional context

I don't know if I'm missing some scenarios where the PIT timer has to be used. If the PIT timer has to be used, is the workload it causes within acceptable limits in Firecracker?

Checks

[√ ] Have you searched the Firecracker Issues database for similar requests?
[√ ] Have you read all the existing relevant Firecracker documentation?
[√ ] Have you read and understood Firecracker's core tenets?

AlexandruCihodaru commented 3 years ago

Thank you for bringing your security concerns to our attention! We will investigate these immediately and follow up with you within 5 business days to provide a status.

raduweiss commented 3 years ago

Quick update: we're still working through this, will get back on this thread in a few days.

alindima commented 3 years ago

Thank you for reporting this issue. Please note that Firecracker customers should not report potential security issues via GitHub. Instead, please follow our security disclosure policy [3] to submit such reports confidentially. With this in mind, we’ve confirmed that this behavior does not represent a security issue within Firecracker. Additionally, AWS Lambda, AWS Fargate and Firecracker on Arm64 are not affected by this issue. More information is below.

On x86 CPUs, Firecracker microVMs use KVM PIT emulated devices which create a kernel thread, kvm-pit, used for injecting timer interrupts. The kvm-pit kernel thread work results in host CPU usage which is by default not constrained by the Jailer/Firecracker cgroup. That kernel thread is created when needed by kvm, and by default is part of the root cgroup. Its CPU overhead is limited by default in kvm to 5000 events per second [1]. In our measurements on EC2 .metal hosts, we found no overhead under normal usage, and found a maximum overhead of up to approximately 3% of one CPU core per microVM at the 5000 event per second limit.

We are aware of 5 options to constrain the CPU overhead that can be consumed by kvm-pit kernel threads on x86 CPUs:

a. Use an external agent to move the kvm-pit/ kernel thread in the microVM’s cgroup (e.g., the cgroup created by the Jailer). This cannot be done by Firecracker since the thread is created by the Linux kernel after guest start, at which point Firecracker is de-privileged.

b. Configure the kvm limit to a lower value [2]. This is a system-wide configuration available to users without Firecracker or Jailer changes. However, the same limit applies to APIC timer events, and users will need to test the impact on workloads in order to apply this mitigation.

c. Implement PIT emulation in Firecracker.

d. Apply a rate limit to the PIT interrupt frequency within Firecracker.

e. Disable the PIT emulation altogether, some time after the guest workload starts.

We recommend options [a] or [b] for users that want to avoid this potential overhead. We don’t think option [e] (recommended in the “desired solution” section above) is appropriate; since Firecracker cannot introspect guest workloads, we cannot guarantee that using this option will prevent additional effects on guest workloads.

To ensure wider awareness of these options, we will shortly add this topic and recommendation to our documentation. Please let us know if you have any other questions or concerns.

[1] https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt

[2] To modify the kvm limit for interrupts that can be injected in a second:

sudo modprobe -r (kvm_intel|kvm_amd) kvm
sudo modprobe kvm min_timer_period_us={new_value}
sudo modprobe (kvm_intel|kvm_amd)

To have this change persistent across boots, we can append the option to /etc/modprobe.d/kvm.conf

echo "options kvm min_timer_period_us=" >> /etc/modprobe.d/kvm.conf

[3] https://github.com/firecracker-microvm/firecracker/blob/main/SECURITY.md

firecracker-microvm / firecracker

[Hardening]Better to forbid the creation of PIT timers after the Guest kernel is booted to avoid some potential security problems #2777