Refactor "Pod Sandbox" to use Virtualization

aurae-runtime / aurae

Distributed systems runtime daemon written in Rust.

https://aurae.io

Apache License 2.0

1.83k stars 89 forks source link

Refactor "Pod Sandbox" to use Virtualization #436

Open krisnova opened 1 year ago

krisnova commented 1 year ago

We need to form an opinion on which virtualization library to use, as mentioned in #433.

Options that I am aware of:

QEMU
Firecracker (KVM)
KVM directly (See https://github.com/rust-vmm)

After we establish a way of running a virtualized workload we need to replace the current pod sandbox implementation detail with two things:

A switching mechanism similar to our init crate that allows us to detect if virtualization is possible at runtime.
Implementation detail for running pods as VMs with a spawned auraed.

krisnova commented 1 year ago

I think this example should give us what we need to run a simple linux kernel and schedule auraed as /bin/init

https://github.com/rust-vmm/linux-loader

krisnova commented 1 year ago

So here is where I think we start.

Check out the try_from function here

It looks like we can pass Boot Arguments and Init Arguments to the linux loader crate which gives us the ability to define our init process similar to any bootloader.

We can hook in here and generate the string to boot a nested auraed as a guest for a pod.

JeroenSoeters commented 1 year ago

I was going to take a shot at this. Wondering, though, if it makes sense to just implement the VmsService and then build the PodSandbox stuff on top. This keeps the scope somewhat contained and we need it anyways. Happy to create a new issue for that work, and link that issue here. Thoughts?

JeroenSoeters commented 1 year ago

Issue for VmsService which we can then leverage for the "Pod Sandbox": https://github.com/aurae-runtime/aurae/issues/439

MalteJ commented 1 year ago

Can we maybe create a good abstraction so we can replace the virtualization implementation later on? I have great sympathy for Firecracker as this is used in production by AWS. When I look at the current state of the aurae project, I think we should try to not get distracted by implementing/extending a hypervisor.

krisnova commented 1 year ago

I think staying out of the hyper visor details is a good move for right now -- I do think it should remain compiled into the auraed binary -- but ideally we should be able to consider other hypervisor implementations at compile time

JeroenSoeters commented 1 year ago

The more I look at the FC code, the more I do not want to implement our own hypervisor :) I will create an RFC once I have better organized my thoughts around this topic. I'm currently exploring Dragonball, which might or might not suit our needs better. https://github.com/kata-containers/kata-containers/tree/main/src/dragonball

Can we maybe create a good abstraction so we can replace the virtualization implementation later on?

This is what kata containers does as well, they abstract the hypervisor and make it pluggable.

MalteJ commented 5 months ago

@JeroenSoeters what do you think about using cloud-hypervisor for this? I think we should create a nice interface and then write an implementation, that leverages cloud-hypervisor underneath. This way we could replace cloud-hypervisor with something else later on. Also, I'd like to have support for classical VMs - which would be a problem with firecracker, as it just supports a very limited set of (virtual) hardware.

JeroenSoeters commented 5 months ago

Last time I looked at this cloud-hypervisor seemed like the best choice yea because of what you mention as well as vhost-net support. I had started some of that work around an interface, I believe the next step was creating TUN/TAP devices from out networking code.

dmah42 commented 2 weeks ago

looks like we've started landing on cloud-hypervisor (which is good).

once that's in place we should circle back to the Pod service per the original issue.