Open cgwalters opened 4 years ago
Bigger picture, probably what we want is something like:
Though alternatively we could try to depend on https://kubevirt.io/ instead of the nested libvirt thingy...
Thinking about that more, a huge benefit of kubevirt would be that we have IaaS like semantics around things like networking but can still use a kube-native flow for managing the VMs. A downside versus libvirt is that a ton of desktop linux users have libvirt easily, very few have kubevirt set up locally.
For now, my thoughts are to deploy a FCOS VM in a pod exposed as a Kube service in the pipeline; we'd provision libvirt there, and have a ssh key secret - the pipeline would talk to it over qemu+ssh://
. To deal with the inevitable leaks of resources, we'd ensure these VMs have a lifetime of at most a day or so.
This would involve nested virt where we own both layers at least.
One path we can investigate is using libvirt - specifically real privileged libvirt. That way we're using libvirt networking including dnsmasq etc. which is heavily tested in all sorts of scenarios (including IPv6). I think to add this to our pipeline we'd end up in a nested virt setup, running a pod which runs a FCOS (or other) VM which runs libvirt, and our tests talk to it over qemu+ssh:// or so.
To elaborate a bit on this...the thing is libvirt is really about "pets" by default. Who hasn't had to clean up an old unused stopped VM they were using from 3 months ago on their desktop?
And trying to share a libvirt instance across different CI tests runs into strong risk of conflict around allocating networks, etc. You really end up needing something like what the OpenShift installer is doing with Terraform to tag resources and help you deallocate.
Probably the simplest is to spin up a separate libvirt-enabled VM that is isolated to each pipeline run for CI/CD; this would be somewhat annoying for local development so we could have a path that shortcut that, but then we'd need to ensure the test framework generated "tagged" VM names etc. and not just hardcoded ones.
One path we can investigate is using libvirt - specifically real privileged libvirt. That way we're using libvirt networking including dnsmasq etc. which is heavily tested in all sorts of scenarios (including IPv6). I think to add this to our pipeline we'd end up in a nested virt setup, running a pod which runs a FCOS (or other) VM which runs libvirt, and our tests talk to it over qemu+ssh:// or so.
To elaborate a bit on this...the thing is libvirt is really about "pets" by default. Who hasn't had to clean up an old unused stopped VM they were using from 3 months ago on their desktop?
FWIW, Libvirt isn't intended to be only for "pets". As an alternative to "persistent" guests which are used by traditional virt apps like GNOME Boxes/Virt-manager, where a config saved in /etc/libvirt or $HOME/.libvirt, it also supports a notion of "transient" guests, where there is no configuration file for the guest saved. A transient VM only exists for as long as it is running, and disappears when shutoff. You can also make it force shutoff, when the client which created it quits. The only thing that would be left behind for a transient guest is the log file under /var/log/libvirt/qemu. If that's a problem, we could likely provide a way to have the log file purged on shutoff too.
FWIW, Libvirt isn't intended to be only for "pets".
Yes, I qualified this with "by default".
The only thing that would be left behind for a transient guest is the log file under /var/log/libvirt/qemu. If that's a problem, we could likely provide a way to have the log file purged on shutoff too.
Definitely for these types of test scenarios we would want absolutely everything cleaned up. But per above I think by far the simplest would be to regularly spin up and tear down a nested VM for this to avoid all state leakage.
We had some discussions about this today. There was rough agreement on keeping with the trend of using virt for privileged operations; cosa
already requires /dev/kvm
and already has code to stand up supermin VMs for privileged operations. So we could: (1) add back a qemu
platform (or alternatively add a new libvirt
platform) which assumes privs, then (2) have pipelines run e.g. cosa supermin kola -p qemu ...
. Local devs of course could just run kola
directly.
Re. qemu
vs libvirt
, there was concern that libvirt
was higher-level than we may want. Additionally, local devs who do have privs may not want kola
fiddling with their libvirt config.
This is related to https://github.com/coreos/fedora-coreos-config/pull/259
Basically we need to beef up our network testing - testing in the initrd specifically.
One major downside of the recent push to use unprivileged qemu for testing is that networking is...hacky. It uses slirp and I'll just summarize things with:
Specifically a bug I was seeing but didn't chase down is that the slirp stack seemed to not be responding to DHCPv6 requests.
One path we can investigate is using libvirt - specifically real privileged libvirt. That way we're using libvirt networking including dnsmasq etc. which is heavily tested in all sorts of scenarios (including IPv6). I think to add this to our pipeline we'd end up in a nested virt setup, running a pod which runs a FCOS (or other) VM which runs libvirt, and our tests talk to it over
qemu+ssh://
or so.