AshleyYakeley / NixVirt

LibVirt domain management for Nix
MIT License
192 stars 21 forks source link

Add the option "forceRedefine" #40

Closed pharra closed 5 months ago

pharra commented 5 months ago

issue: libvirt may touch the XML configuration when it think the configuration is wrong/conflict, so it's different between the xml configuration that nixvirt generated and the configuration that defined in libvirt. It will redefined every domain even the configuration is unchanged, it caused the running machine reboot when running "nixos-rebuild switch".

solution: check the settings with old setting to determine if anything changed

AshleyYakeley commented 5 months ago

How would it work in this scenario?

  1. user runs NixOS / HM switch
  2. user modifies domain with VMM or Gnome Boxes
  3. user runs NixOS / HM switch

NixVirt should revert the domain modifications in this case, but with this PR it wouldn't.

pharra commented 5 months ago

there is a scenario: when the VM is running, libvirt will add a new section to the domain XML:

  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+0</label>
    <imagelabel>+0:+0</imagelabel>
  </seclabel>

it means, the machine will be restarted when user run "NixOS switch" every time even if user doesn't edit the nixvirt configuration because nixvirt think the xml is different (please correct me if I'm wrong).

This means that users might have unintentionally executed nixos-rebuild switch, which could result in data loss if the current VM has critical tasks. I consider this risk unacceptable. In contrast, I believe that users who manage the domain using both NixVirt and VMM are aware of their intentions. We could add an option, such as ‘ForceRefresh,’ in the NixVirt module to allow users to decide whether they want to revert modifications every time. For example, manually running "systemctl restart nixvirt-force-fresh" would force a refresh of previous changes.

AshleyYakeley commented 5 months ago

I believe that users who manage the domain using both NixVirt and VMM are aware of their intentions.

Well, probably, but I think it's a reasonable expectation that NixVirt would always set up their domains correctly.

By contrast I think it's OK that NixOS activation might cause domains to restart? You're reconfiguring your system, after all, that's a major change. It may cause system services to restart and it may cause VMs to restart.

AshleyYakeley commented 5 months ago

Possible solution: delete the seclabel node in DomainConnection._relevantDefETree. Would that work?

pharra commented 5 months ago

Well, probably, but I think it's a reasonable expectation that NixVirt would always set up their domains correctly.

You are right.

here is the diff between running state and stop state of one vm:

1c1
< <domain type='kvm' id='9'>
---
> <domain type='kvm'>
7,9d6
<   <resource>
<     <partition>/machine</partition>
<   </resource>
37,38c34
<       <source pool='VMPool' volume='NixOS.qcow2' index='2'/>
<       <backingStore/>
---
>       <source pool='VMPool' volume='NixOS.qcow2'/>
40d35
<       <alias name='virtio-disk0'/>
44c39
<       <driver name='qemu'/>
---
>       <driver name='qemu' type='raw'/>
47d41
<       <alias name='sata0-0-2'/>
51d44
<       <alias name='usb'/>
55d47
<       <alias name='ide'/>
58,60c50
<     <controller type='pci' index='0' model='pcie-root'>
<       <alias name='pcie.0'/>
<     </controller>
---
>     <controller type='pci' index='0' model='pcie-root'/>
64d53
<       <alias name='pci.1'/>
70d58
<       <alias name='pci.2'/>
76d63
<       <alias name='pci.3'/>
82d68
<       <alias name='pci.4'/>
88d73
<       <alias name='pci.5'/>
94d78
<       <alias name='pci.6'/>
100d83
<       <target dev='vnet15'/>
102d84
<       <alias name='net0'/>
106d87
<       <alias name='input0'/>
110d90
<       <alias name='input1'/>
113,118c93,94
<     <input type='mouse' bus='ps2'>
<       <alias name='input2'/>
<     </input>
<     <input type='keyboard' bus='ps2'>
<       <alias name='input3'/>
<     </input>
---
>     <input type='mouse' bus='ps2'/>
>     <input type='keyboard' bus='ps2'/>
121d96
<       <alias name='tpm0'/>
124d98
<       <alias name='sound0'/>
130d103
<       <alias name='video0'/>
132,134c105
<     <watchdog model='itco' action='reset'>
<       <alias name='watchdog0'/>
<     </watchdog>
---
>     <watchdog model='itco' action='reset'/>
136d106
<       <alias name='balloon0'/>
141d110
<       <alias name='rng0'/>
145,148d113
<   <seclabel type='dynamic' model='dac' relabel='yes'>
<     <label>+0:+0</label>
<     <imagelabel>+0:+0</imagelabel>
<   </seclabel>

I'm not sure if it is hard to ignore all the different nodes.

a possible solution:We can do what vmm does: modify the configuration of a running machine only after it is restarted or shutdown.

adding a libvirt hook script

    machine=$1
    command=$2
     if [ "$command" == "stopped" ]; then
      # check domain if need to update or already
      virt-helper --connect "qemu:///system" --domain $machine
      fi
    fi

how about your thoughts?

AshleyYakeley commented 5 months ago

So this is how NixVirt decides whether to restart a running domain, during activation:

  1. Fetch the XML of the domain
  2. Set the specified XML of the domain
  3. Fetch the XML of the domain (with VIR_DOMAIN_XML_INACTIVE flag, indicating "the configuration that will be used on the next boot of a persistent domain")
  4. For both old and new XML, parse them into etrees, and then "clean" them (using _cleanDefETree)
  5. Compare cleaned etrees to see if they are different.

I think if possible, I would want to modify _cleanDefETree or _relevantDefETree to remove the irrelevant nodes in the etree, to correctly determine if the domain definition has really changed.

Could you switch on virtualisation.libvirt.verbose and post the NixVirt output when running nixos-rebuild switch? The output will include the difference of etrees that caused NixVirt to determine that the domain definition had changed.

pharra commented 5 months ago

I need to correct my previous mistakes. From the Diff LOG(previously, i compare the diff of XML using virsh dumpxml [domain]), the XML Elements automatically added by libvirt in running Machine will not be misjudged. The reason for diff is seems that I did not configure the MAC address for multiple network cards, so the MAC address of network card was changed every time. attach the log here:

6月 02 18:00:25 homelab nixvirt-start[341102]: domain ee43005c-2e7b-4af2-bfae-8c52eeb22678: changed:
6月 02 18:00:25 homelab nixvirt-start[341102]: [move, /domain/devices/interface[3]/mac[1], /domain/devices/interface[2], 0]
6月 02 18:00:25 homelab nixvirt-start[341102]: [update-attribute, /domain/devices/interface[2]/mac[1], address, "52:54:00:24:37:b7"]
6月 02 18:00:25 homelab nixvirt-start[341102]: [move, /domain/devices/interface[2]/mac[2], /domain/devices/interface[3], 0]
6月 02 18:00:25 homelab nixvirt-start[341102]: [update-attribute, /domain/devices/interface[3]/mac[1], address, "52:54:00:e0:22:d7"]
6月 02 18:00:25 homelab nixvirt-start[341102]: domain ee43005c-2e7b-4af2-bfae-8c52eeb22678: deactivate

interface configuration:

            interface = [
              {
                type = "bridge";
                model = {type = "virtio";};
                source = {bridge = "br0";};
              }
              {
                type = "bridge";
                model = {type = "virtio";};
                source = {bridge = "ib";};
              }
              {
                type = "bridge";
                model = {type = "virtio";};
                source = {bridge = "eth";};
              }
            ];

but I found that redefine a running vm will not reboot it, the config changed only after the machine shutdown next time, so just add a option "forceRedefine" to control it, default is true, and will restart a running vm, otherwise it will not. a quick question here: do we need to add the condition "forceRedefine" in the method "SetActive"?