cunicu / gont

A Go testing framework for distributed applications
http://gont.cunicu.li/
Apache License 2.0
77 stars 5 forks source link

Add support for external interfaces #160

Open Infinoid opened 4 months ago

Infinoid commented 4 months ago

Hi,

I've got a qemu VM with a couple of network interfaces (tap devices). I want to write some unit tests using Gont, to make sure the VM is doing the network-y stuff it needs to do.

So, I want to ask: what's the right way to do this? Is it likely to be easy or hard?

I can think of a couple ways to approach it:

  1. spawn the vm in-place
    • create a gont Host
    • create a couple of bridge devices in gont
    • plug those bridges into the rest of the test harness
    • run qemu in that host using pkg.Exec
    • ask qemu to attach its network devices to those bridges
    • wait for it to boot and become ready, then run the tests
  2. attach to an existing vm
    • start qemu on the host with a couple of tap devices
    • create a gont Host
    • move the tap devices into the right namespace, using ip link set netns or somesuch
    • tell gont that these are Interfaces, somehow
    • plug those Interfaces into the rest of the test harness
    • run the tests

Do you think either of these ideas would work? I figured I'd ask, since it's pretty involved and I don't see any examples that go very deep in this direction.

Thanks!

Infinoid commented 4 months ago

I tried taking the "spawn the vm in-place" approach, and got to the point where I can have another Host ping the VM and get a response. So I think this will work.

This is a simplified version (I only show one of the two subnets, and omit error checking), but it goes like this:

    network, _ := g.NewNetwork("testnet")
    defer network.Close()
    subnet, _ := network.AddSwitch("subnet")
    pinger, _ := network.AddHost("pinger", g.NewInterface("eth0", subnet, opt.AddressIP("2::2/64")))
    vm, err := network.AddHost("vm",
        g.NewInterface("lanlink", subnet),
    )
    vm.Run("ip", "link", "add", "lansw", "type", "bridge")
    vm.Run("ip", "link", "set", "up", "dev", "lansw")
    vm.Run("ip", "link", "set", "master", "lansw", "dev", "lanlink")

    // launch a VM, its ip address is 2::1/64
    qemu := vm.Command("qemu-system-aarch64",
        "-netdev", "bridge,br=lansw,id=lan", "-device", "virtio-net-pci,netdev=lan",
        // snip many many other qemu params
    )
    qemu.Start()

    // replace this with a more reliable way to detect that the VM has booted
    time.Sleep(time.Second * 30) 

    cmd = pinger.Command("ping6", "-c", "10", "2::1")
    out, _ = cmd.CombinedOutput()
    fmt.Printf("ping6 -c 10 2::1 said:\n----\n%s\n----\n", out)

So the network topology is: host (pinger) → interface (eth0) → switch (subnet) → interface (lanlink) → bridge (lansw) → tap → qemu

That all works, though I'm having to do the bridge setup stuff by hand. Is this the right way to do it, or is there a better approach?

I see that the "lanlink" device in the "vm" Host has a link-local address, though I don't plan to use it at all. Do I need to do anything special to turn off accept-ra and so forth for that interface? I want the VM to be doing all of the network interaction here, with no interference from the host.

Another question: if I'm running a background process in a Host (qemu in this case) and run into an error or something, will gont kill that process as it tears everything else down? Or should I kill it myself? Will that process prevent the netns'es, bind-mounts, etc from being cleaned up?

Should I be calling network.Close() or network.Teardown() to clean up? These methods have no docs.

stv0g commented 4 months ago

Hey @Infinoid,

thanks for trying this out. Using QEmu has not been on of my use-cases yet. So its not directly supported or tested. But I would like to add support for it as I believe it can be a quite valuable feature :)

That all works, though I'm having to do the bridge setup stuff by hand. Is this the right way to do it, or is there a better approach?

Your initial attempt by creating a dedicated Gont Host and executing QEmu in its namespace definitely works. However, I think we can do better by simplifying the setup by getting rid of the extra namespace and bridge device.

I would like to add a new type of interface in Gont which reflects an ExternalInterface. Basically any type of interface which exists in the default network namespace. Gont would adopt this namespace and move it into the namespace of a Gont switch to which it is connected. When the network is torn down, Gont would release that interface back to the default namespace.

The nice advantage here is that such an External Interface could be used with a lot of other types of interfaces as well:

I will try to implement this feature in the next hours. Some feedback and/or testing would be highly welcome :)

All the different use cases could be nicely abstracted away by dedicated types in Gont:

Another question: if I'm running a background process in a Host (qemu in this case) and run into an error or something, will gont kill that process as it tears everything else down? Or should I kill it myself? Will that process prevent the netns'es, bind-mounts, etc from being cleaned up?

Gont currently does not terminate an sub processes started via the .Run(), .Start() or .RunGo(), .StartGo() APIs. So any networking resources may remain allocated until those processes have been terminated.

But I believe it would be a nice addition to terminate all sub-processes of a host, when the host is torn down. Or respectively if we tear down a network, we also stop all processes of the hosts which are part of this network.

Implementation-wise this is a bit more tricky. We could use process groups (as in shell job control). But those require a dedicated process to be the control group leader. We dont have that per se. The alternative could be a cgroup to which we assign all processes of a host. And as cgroups are hierachical, we could also have a cgroup for the network itself. That way we can also kill all processes of the whole network.

Lets keep track of this idea in #163.

Infinoid commented 4 months ago

I would like to add a new type of interface in Gont which reflects an ExternalInterface. Basically any type of interface which exists in the default network namespace. Gont would adopt this namespace and move it into the namespace of a Gont switch to which it is connected. When the network is torn down, Gont would release that interface back to the default namespace.

Okay. So if I understand correctly, this would be a way to support the "attach to an existing vm" approach, where I have a VM already running, Gont would pull in qemu's tap devices, wrap a test network around it, run its tests, then return qemu back to the host. That sounds nice. The advantage of this approach is that gont doesn't need to control the qemu process, and doesn't need to wait for it to boot up, shut down gracefully, or any of that stuff. Sounds very useful.

a QEmu VM could have multiple interfaces each of which connected to a different switch in the Gont network. That way we could even test QEmu-based router/NATs/firewalls.

Yes! This is exactly what I am trying to do. I want to ensure updates to my router configuration (in the form of an ansible playbook) behave nicely in a VM, connected to a fake LAN and a fake internet, before I deploy it on the real router hardware.

I think this ability will be very powerful.

Infinoid commented 4 months ago

I am looking forward to the ExternalInterface feature you describe. I think it will be perfect for what I need. For now, I've mocked up the interface capture like this:

    err = system("ip", "link", "set", "beeptestlan", "netns", beep.Namespace.Name)
    require.Nil(t, err)
    defer func() {
        _, err = beep.Run("ip", "link", "set", "beeptestlan", "netns", "1")
        if err != nil {
            log.Printf("cleanup: moving beeptestlan back to main netns failed: %v", err)
        }

        _ = system("ip", "link", "set", "down", "dev", "beeptestlan")
    }()

    _, err = beep.Run("ip", "link", "set", "up", "dev", "beeptestlan")
    require.Nil(t, err)

and it works pretty well.

Combined with the bridge configuration discussed above, this provides a good environment for writing/debugging unit tests with a live VM. I can have the VM running in one window, running a packet sniffer or tailing a log, whatever is needed. Then, I run the gont test program in another window, it quickly attaches to the VM's tap devices, runs the tests, and then releases the tap devices. It takes less than 2 seconds, which is great. (Though I do have to wait for NO-CARRIER flags to go away, sometimes.)

Unlike my previous attempt (where the test runs qemu directly), I can repeat it as many times as needed until I get it right, and I don't need to wait for the VM to reboot each time. I like it.

I think the ExternalInterface feature will make this much cleaner.