VM tests should not depend on any specific VM technology

aragnon commented 9 years ago

I was trying to run some of the tests, which run some virtual machines with a virtual network automatically, but since the virtualbox module is already loaded, those don't work.

The way this should work is that the system should detect which virtualization technologies are supported on the host system (which might be NixOS) and then it should use whatever candidate works best.

I am thinking of Xen, Virtualbox, and KVM.

lucabrunox commented 9 years ago

This happened to me as well. Btw I would be even fine to manually specify what virtualization to use from command line.

On Sat, Dec 6, 2014 at 12:58 PM, aragnon notifications@github.com wrote:

I was trying to run some of the tests, which run some virtual machines with a virtual network automatically, but since the virtualbox module is already loaded, those don't work.

The way this should work is that the system should detect which virtualization technologies are supported on the host system (which might be NixOS) and then it should use whatever candidate works best.

I am thinking of Xen, Virtualbox, and KVM.

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixpkgs/issues/5241.

NixOS Linux http://nixos.org

offlinehacker commented 9 years ago

+1 also add lxc/docker as test backend

domenkozar commented 9 years ago

What would be the benefit? This would add quite some maintenance overhead to the tests (which are already sometimes hard to debug/fix).

domenkozar commented 9 years ago

In software, features means code complexity and we can't just implement every feature. We have to be selective and think about not only the values but also (and mostly) QA (maintenance, tests, documentation).

lucabrunox commented 9 years ago

@iElectric evidently it never happened to you that once you opened 4 vagrant instances with vbox you had to shutdown all of them to run nixpkgs tests. It's annoying.

domenkozar commented 9 years ago

@lethalman I did. The issue can be circumvented with documentation, it's more of a tedious limitation than a blocker for using qemu. Maybe it can even be addresses in another way than implementing another backend for tests.

cc @edolstra

lucabrunox commented 9 years ago

Another possibility is to use libvirt, will enable all of the virtualization backends for free.

domenkozar commented 9 years ago

If someone wants to tackle this, then start here: http://wiki.libvirt.org/page/QEMUSwitchToLibvirt

We have quite some infrastructure based on nixos test driver, so this will involve quite some work.

lucabrunox commented 9 years ago

I plan in the future to work on libvirt porting if @edolstra agrees.

aragnon commented 9 years ago

@goodwillcoding I think you lack a sense for timing your comment. Additionally, you have no clue whether or not you should interfere in a discussion, considering that you have contributed nothing to this discussion.

edolstra commented 9 years ago

@aragnon Insulting other developers (e.g. calling them trolls or "stupid") is not acceptable on this project. Please refrain from doing so in the future.

Regarding the issue: obviously it would be nice to have other VM backends, but as @iElectric says, it would also be rather substantial amount of work. Also, we already have enough non-determinism issues in VM tests without having to worry about different VM backends being selected on different build machines.

thoughtpolice commented 9 years ago

One thing is that QEMU has proven hard to debug several times in the past when we've done upgrades of it, and people (like me) like those upgrades, so that sucks a bit. But I'm not sure another backend would fare any better here - it seems pretty reasonable to assume that e.g. Dom0 upgrades could just as easily break tests, and VirtualBox always seems to break something somewhere. In general these tests are always going to be brittle, to some degree; adding more complexity is madness.

We should at least document that you'll need QEMU/KVM drivers and that other modules are incompatible in the documentation.

And on that note, I'll expand on the madness:

You have the people who have issues, the people who prioritize those issues, and the people who solve issues. You are not in the position to discuss the "benefit" at this point anymore, because there are already three people who do see value in it. The only option you have at this point is to give this issue a lower priority (for example by not working on it). Any other action is just delaying the process. Please, now explain to me why I had to explain this to you. Assuming you can write software, you are not stupid, right? (This naturally only leaves as a possibility that you were trolling me, although I cannot really figure out why you would want to do that.)

That's not software development. That's called 'insanity'. Here is an actual lesson in software development: For every component X in your system, if you add an alternative component Y, you must also add a third component Z at minimum: Z allows you to switch between X and Y. Now count the moving parts: you now have three moving parts instead of one. But you forgot the other three new moving parts: not just X, Y, and Z, but also the interactions between them: X to Z, Y to Z, and X to Y. This, in essence, increases the failure rate by a factor of six versus the original setup.

But you may say "Y and Z are not that big", but that's besides the point: ignore the constant factors, and play this scenario out with three choices instead of two. And then four instead of three. You should pretty quickly see a pattern - one that has unmanageable amounts of complexity.

So, in short, you're basically asking us to - instead of shipping a set of working tests - ship an unmanageable set of tests to users, and rather than perhaps having some glitches, instead let a user decide how it all explodes in flames so they can come complain to us like you have. The only difference being the software is now several times more complex, and thus even more difficult to fix. Considering the developer effort we have currently, this is not only impractical, it is pretty much laughable.

The cure is worse than the disease.

lucabrunox commented 9 years ago

@edolstra what about my proposal of using libvirt?

rbvermaa commented 9 years ago

I have changed some of the comments and removed some that are non-productive. Please keep to discussing of the the original issue from here on out.

It seems like the issue is a valid issue, we can keep it open as @lethalman has stated he might spend some time working on this.

Personally I agree with the comments that think this might introduce too much complexity or extra non-determinism in the tests. However, perhaps someone who implements this will show us wrong.

copumpkin commented 9 years ago

:+1: :+1: :+1:

My approach wouldn't be to auto-detect based on host, since as others have mentioned it could introduce nondeterminism. However, even the ability to explicitly state that I'd like to run the suite using one particular backend would be a big plus, or even doing so on a per-test basis.

Ultimately, I'd probably just want to run many of my tests in NixOS containers, since I'm using a lot of AWS and I can't run full-fledged virtualization software efficiently inside an EC2 instance.

wkennington commented 9 years ago

A lot of these like the installer tests inherently depend on being inside a VM or some kind of mutable container and would be specific to the tech. Networking tests rely on being able to add kernel devices for testing network configuration. Otherwise, this could probably work for most other test cases. On Apr 19, 2015 11:21 AM, "Daniel Peebles" notifications@github.com wrote:

[image: :+1:] [image: :+1:] [image: :+1:]

My approach wouldn't be to auto-detect based on host, since as others have mentioned it could introduce nondeterminism. However, even the ability to explicitly state that I'd like to run the suite using one particular backend would be a big plus, or even doing so on a per-test basis.

Ultimately, I'd probably just want to run many of my tests in NixOS containers, since I'm using a lot of AWS and I can't run full-fledged virtualization software efficiently inside an EC2 instance.

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixpkgs/issues/5241#issuecomment-94303226.

copumpkin commented 9 years ago

Yeah, I was just thinking a related (though independent) feature would be to have isContainer = true configs throw errors if you try to do silly things in them like change kernels or bootloaders.

On Sunday, April 19, 2015, William A. Kennington III < notifications@github.com> wrote:

A lot of these like the installer tests inherently depend on being inside a VM or some kind of mutable container and would be specific to the tech. Networking tests rely on being able to add kernel devices for testing network configuration. Otherwise, this could probably work for most other test cases. On Apr 19, 2015 11:21 AM, "Daniel Peebles" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

[image: :+1:] [image: :+1:] [image: :+1:]

My approach wouldn't be to auto-detect based on host, since as others have mentioned it could introduce nondeterminism. However, even the ability to explicitly state that I'd like to run the suite using one particular backend would be a big plus, or even doing so on a per-test basis.

Ultimately, I'd probably just want to run many of my tests in NixOS containers, since I'm using a lot of AWS and I can't run full-fledged virtualization software efficiently inside an EC2 instance.

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixpkgs/issues/5241#issuecomment-94303226.

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixpkgs/issues/5241#issuecomment-94307896.

copumpkin commented 9 years ago

Now that https://github.com/NixOS/hydra/issues/201 is starting to work, having this is even more appealing!

domenkozar commented 8 years ago

If it's useful to anyone else, here's the patch to be able to ssh into NixOS tests and debug the state:

diff --git a/nixos/lib/test-driver/test-driver.pl b/nixos/lib/test-driver/test-driver.pl
index 8ad0d67..838fbdd 100644
--- a/nixos/lib/test-driver/test-driver.pl
+++ b/nixos/lib/test-driver/test-driver.pl
@@ -34,7 +34,7 @@ foreach my $vlan (split / /, $ENV{VLANS} || "") {
     if ($pid == 0) {
         dup2(fileno($pty->slave), 0);
         dup2(fileno($stdoutW), 1);
-        exec "vde_switch -s $socket" or _exit(1);
+        exec "vde_switch -tap tap0 -s $socket" or _exit(1);
     }
     close $stdoutW;
     print $pty "version\n";

domenkozar commented 6 years ago

I've added vm test debugging machinery in https://github.com/NixOS/nixpkgs/pull/47418

davidak commented 5 years ago

@edolstra Would libvirt be an option? Any other way to fix this issue?

Or should we just close this issue?

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

Search for maintainers and people that previously touched the related code and @ mention them in a comment.
Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.

domenkozar commented 3 years ago

This is probably the major missing bit for NixOS tests to become extremely useful.

The current requirement of qemu needing hardware virtualization makes it really hard to use them on a CI.

If we had some kind of container backend, that would mean better integration with CIs.

JojOatXGME commented 3 years ago

I also had some problems with required KVM support. (Either because I wanted to run the tests on a VM and I don't have nested virtualization, or because I wanted to run the tests on CI.) After some research, I was wondering if there is anyone having experience with UML. To me (with my basic knowledge about this topic), it looks like UML could allow us to run the tests efficiently without using hardware virtualization in the first place.

roberth commented 2 years ago

Successful developments in this area have been

HVF support through qemu (similar to KVM but a macOS host), progress: https://github.com/NixOS/nixpkgs/issues/108984; no significant changes to test framework
https://github.com/NixOS/nixpkgs/pull/126713
Test glue now uses modules; will help keep glue mess in check https://github.com/NixOS/nixpkgs/pull/191540

TBD

https://github.com/NixOS/nix/pull/3600 (essential for container backend)
139788 (stalled; was not coordinated in advance, while 126713 was in progress)
https://github.com/NixOS/nixpkgs/pull/193336

NixOS / nixpkgs

VM tests should not depend on any specific VM technology #5241

139788 (stalled; was not coordinated in advance, while 126713 was in progress)