bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.81k stars 521 forks source link

Add Kata Containers to images #4070

Open fheinecke opened 5 months ago

fheinecke commented 5 months ago

Hi folks,

I'd like to open an issue for adding Kata Containers to Bottlerocket OS. The Kata Containers project adds a new runtime to support running containers inside lightweight VMs, such as Firecracker VM or Cloud Hypervisor. The two projects are security-oriented and I think that they complement each other quite well. Kata Containers provides a level of isolation between "containers" that is not normally achievable with the typical namespace/cgroup approach that most CRIs take, and Bottlerocket OS's secure-by-default approach helps limit the impact of any vulnerabilities in the container runtime.

I have a working proof-of-concept of installing Kata Containers after the node has started, however, it requires spawning a pod with super_t context and has some other security risks. I've talked with the Kata team and we feel that the most secure solution would be to include the binaries in the OS image. However, it would be possible to add out of the box compatibility to the upstream Kata project if it cannot be added here, at the cost of being a less-secure solution.

There are several ways that Kata could be added to Bottlerocket:

  1. Bottlerocket could include it in all images. This would make it really easy for users to get started with containers as VMs. All users would need to do is specify the desired runtime when starting a container, via docker run --runtime containerd.kata.v2 or a k8s runtime class.

    The downside is that many users might not want to use kata containers. Including these packages would add about a gigabyte of disk space, and would add some additional processes that are running all the time. Part of this could potentially be mitigated by adding a toggle in the settings to enable or disable the runtime. The image would include everything needed to get started (binaries, config, selinux policies), but containerd would not be configured to start these processes until explicitly enabled.

  2. Bottlerocket could create a new variant (or variants) with this package. This would be more to maintain, but would exclude the package from the "normal" variants so that it's not included in every image. I believe that this would add 15 more image if a Kata variant was added for all current variants that support k8s.
  3. Kata could add support for Bottlerocket in their install tooling. This would require little to no change on Bottlerocket's end. The downside is that this objectively less secure than including it in a Bottlerocket image. Here's specifically where some of the security issues lie:
    • Installation requires super_t access.
    • Kata binaries are included under /opt, which means that they could be overwritten with a malicious version.
    • The super_t actually needs more permission than it already has so that it can relabel the Kata runtime binaries as runtime_exec_t. Due to a denyalways statement, this requires that selinux be temporarily disabled, globally, at runtime for processes with the super_t context.
    • If installing on k8s via a daemonset (as is the standard process for Kata on k8s), there will be a long running pod with these permissions and several host mounts.
      1. The company I work for might be willing to maintain a variant as described in (2) for as long as we use both bottlerocket and kata. The downside is that if we stop using either of these projects at any point, we would probably also stop supporting this variant. Additionally, anybody who wanted to use these images would need to trust us as much as they trust the bottlerocket project.

Would the Bottlerocket project be willing to accept a PR for (1) or (2)? I'm currently willing to put in most of the work here, but I'd like to know beforehand if there is some version of this that the project would accept.

What I'd like: Kata Containers deployed with Bottlerocket OS

Any alternatives you've considered: See discussion above

yeazelm commented 5 months ago

Thank you @fheinecke for cutting such a detailed issue. We have discussed Kata Containers in https://github.com/bottlerocket-os/bottlerocket/issues/812 as well. You provided a lot of data and its taking a bit to work through it so I wanted to let you know I've seen this and I'm working on a response so I'll come back here with more details as I have them.

yeazelm commented 4 months ago

I wanted to provide a bit of an update from the discussions that have happened offline.

Would the Bottlerocket project be willing to accept a PR for (1) or (2)? I'm currently willing to put in most of the work here, but I'd like to know beforehand if there is some version of this that the project would accept.

I can rule out option 1. That is quite a bit of additional software in existing variants that only have limited use in much of EC2 since you need to be using bare metal instances for the virtualization to work. This is a core reason why we have variants: to allow users to choose between these types of use cases while keeping their images minimal. Kata containers is enough of a departure from the other existing variants that it would warrant its own variant just like how NVIDIA use cases were enough of a departure to warrant their own set of variants.

Kata could add support for Bottlerocket in their install tooling.

As you called out in option 3, there are a lot of downsides and I think we agree that it is less than ideal.

We have been investing in tooling to make building and maintaining your own variant significantly easier, so I'd like to focus on Option 4 instead of 2. We recently broke out the package definitions into the bottlerocket-core-kit with a primary goal of enabling much better support for this option or options like it. This path isn't without its own work to figure out how the tooling enables you to build and maintain your own variants with these types of changes, but the Bottlerocket team is actually pretty excited about the possibly of working together to make a version of this option viable for everyone involved. I think there is a lot of merit in figuring out what might work with Option 4.

I'll work on collecting more thoughts about the technical steps needed, but as the first pass, we need to build in some ability to configure the SELinux contexts appropriately for Kata containers. A good starting point would be to create a fork of this repo as it exists now and start trying to prototype out this enablement by adding your own variant definitions and adding packages for Kata containers in the fork. This would enable reviews to happen on this code and guidance around any challenges you run into.