move-instance difficult to use and ultimately fails

ganeti / ganeti

Ganeti is a virtual machine cluster management tool built on top of existing virtualization technologies such as Xen or KVM and other open source software.

http://www.ganeti.org

BSD 2-Clause "Simplified" License

503 stars 110 forks source link

move-instance difficult to use and ultimately fails #1696

Open anarcat opened 1 year ago

anarcat commented 1 year ago

I'm trying to migrate between two Ganeti clusters. I have found with great anticipation the move-instance command, but I'm having a hard time making it work.

At first, it would just crash with a backtrace in Debian bullseye:

TypeError: '>' not supported between instances of 'NoneType' and 'int'

That's due to this code:

https://github.com/ganeti/ganeti/blob/114e59fcc9d4a7c82618569f5d6b7389a0f80123/tools/move-instance#L941

If I pass --opportunistic-tries=1 it tells me:

move-instance: error: Opportunistic instance creation can only be used with an iallocator

So, basically, right now, you must use:

move-instance --opportunistic-tries=1 --iallocator=hail

According to @apoikos (on IRC), the TypeError is a python2-to-3 leftover...

The next problem I had with move-instance was a ganeti.rapi.client.Error: Password not specified, but that was me failing at setting up the RAPI users. I also got ganeti.rapi.client.GanetiApiError: 401 Unauthorized: No permission -- see authorization schemes on the destination cluster. Maybe the docs could be improved to lead the operator the right way ("check your RAPI users again") in the documentation. Having a way to test the users out of band (say with curl) would also be useful here.

Then I had another error which was pretty opaque:

ganeti.errors.OpPrereqError: ("Invalid handshake: Hash didn't match, clusters don't share the same domain secret", 'wrong_input')

So that might seem obvious but I did copy the secret over and ran:

gnt-cluster renew-crypto --cluster-domain-secret=cluster-domain-secret

So it seems the bug there is that the --cluster-domain-secret= argument actually fails to replace the secret on the cluster. I had to manually copy the cluster-domain-secret file in /var/lib/ganeti and restart the server for that to work.

But what completely blocked me is this:

ganeti.errors.OpPrereqError: ('If network is given, no mode or link is allowed to be passed', 'wrong_input')

It looks like the source node is encoding NIC information in the backup and the target node is somewhat unhappy with it. I'm not sure how to debug this: I'm lost in the stack between the client and server method definitions and I don't actually understand what's going on so much.

Did anyone get that thing to work at all? What am I doing wrong?

Should I open separate issues for those things?

anarcat commented 1 year ago

oh, and for what it's worth, I have a manual procedure for moving VMs around with export/import documented here:

https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ganeti/#migrating-a-vm-between-clusters

it's basically, on the source node:

gnt-backup export -n chi-node-01.torproject.org test-01.torproject.org

and on the target node:

rsync -ASHaxX --info=progress2 root@chi-node-01.torproject.org:/var/lib/ganeti/export/test-01.torproject.org/ /var/lib/ganeti/export/test-01.torproject.org/
gnt-backup import -n dal-node-01:dal-node-02 --src-node=dal-node-01 --src-dir=/var/lib/ganeti/export/test-01.torproject.org --no-ip-check --no-name-check --net 0:ip=pool,network=gnt-dal-01 -t drbd --no-wait-for-sync test-01.torproject.net

... so that works, but it's still a manual process, with multiple steps and everything is a little error-prone, with multiple long-running processes punctuated by manual "copy-paste" things which is not ideal for large clusters. (And yes, I could also have my own automation to speed this up myself, but then i'd be rewriting move-instance, wouldn't i? :)

rbott commented 1 year ago

Hi @anarcat,

we have used the tool a lot in all of our Ganeti 2.16 -> 3.0 Upgrades (basically we set up new small clusters with fresh hardware and Ganeti 3.0, moved some instances, re-purposed/re-installed older nodes where possible and added them to the new cluster(s), moved some more instances etc.). We also stumbled upon some bugs and/or missing features which are fixed in master here, here and here. You can use the script standalone without the rest of the tree directly from master to carry out the migrations.

We ended up with the following pre-setup:

put the cluster-domain-secret manually in /var/lib/ganeti/cluster-domain-secret and use gnt-cluster redist-conf to redistribute it on both source and destination clusters
if the source system is even Ganeti 2.15 (which also worked fine) on Debian Jessie make sure that socat is installed from the base distribution, not from the backports repository (Ganeti 2.16 contains a fix for that so we did not face that issue on newer machines)
make sure source Ganeti nodes are able to establish TCP connections on ports > 1024 to destination Ganeti nodes (inter-cluster-migration will always use the primary network of the cluster, not a secondary/alternate network which may be configured for e.g. DRBD stuff)

We then used an Ansible playbook to carry out the actual instance migrations, but that happenly mainly due to easier integration into other internal workflows which are not relevant here. It did the follwoing:

pre-flight check if all relevant systems, APIs, ports etc are reachable
the export/import scripts from the debootstrap OS provider are broken/not usable in our scenario ("partition style" full disk images), so we replace them on-the-fly with the ones from gnt-noop which simply use dd to do a bitwise transfer of all disks
call the migration script with the relevant parameters

Our instance scenario:

KVM hypervisor, DRBD storage
1-2 VirtIO disk(s), 1-2 VirtIO NIC(s) (bridged mode with one bridge per Vlan on source node, bridged mode with tagged vlans on destination node)
parameters are mostly set on cluster level (e.g. cpu_type, vnc/spice) and might change during migration to the new cluster default
because of the change in the instance network configuration (see above) we alter the network parameters (e.g. link=br604 turns into link=gnt-bridge,vlan=604) and also statically set the mac adress the interface had on the source cluster (otherwise a new one will be assigned during instance creation on the destination cluster) - this can be done by adding something like --net 0:link=gnt-bridge,vlan=604,mac=aa:bb:cc:dd:ee:ff to the command-line of move-instance)

I think I remember that while looking at the code of move-instance we found out that it actually does make some assumptions on the instance configuration which might lead to the NIC-configuration-related-error you stated above.

The above is far from perfect and still yields some issues that should actually be fixed upstream. But it worked well for us with several hundred instances so far. Hope that helps a bit :-)

rbott commented 1 year ago

We have also used the same approach to move instances between Ganeti 3.0 clusters. However, due to an issue with more recent socat versions, this needs a manual change of the export/import code on the source node :-(

More information can be found in this issue

rbott commented 1 year ago

Oh and I would definitely say (to add something more useful to this issue): I would suggest to a) extend the documentation with more guidance/example commands/pitfalls and b) of course fix the open/known issues, e.g. the setting of the shared secret which clearly is a bug.

I might find some time in the next days to extend the documentation.

anarcat commented 1 year ago

wow, that's all extremely useful! that --keep-instance flag is invaluable, I didn't even realize the move-instance script was trashing instances on the source cluster, ouch! I guess it makes sense because of the "move" semantic, but still, dang...

the export/import scripts from the debootstrap OS provider are broken/not usable in our scenario ("partition style" full disk images), so we replace them on-the-fly with the ones from gnt-noop which simply use dd to do a bitwise transfer of all disks

amazing, I converged over the exact same thing, probably because of the exact same bug, see https://github.com/ganeti/instance-debootstrap/issues/18

because of the change in the instance network configuration (see above) we alter the network parameters (e.g. link=br604 turns into link=gnt-bridge,vlan=604) and also statically set the mac adress the interface had on the source cluster (otherwise a new one will be assigned during instance creation on the destination cluster) - this can be done by adding something like --net 0:link=gnt-bridge,vlan=604,mac=aa:bb:cc:dd:ee:ff to the command-line of move-instance)

so basically I need to actually allocate a MAC address for each VM I move? ouch?

I was hoping i could just batch-move instances here to quickly evacuate a cluster, individually mapping MAC addresses doesn't sound like a fun time...

I think I remember that while looking at the code of move-instance we found out that it actually does make some assumptions on the instance configuration which might lead to the NIC-configuration-related-error you stated above.

okay, that definitely sounds familiar. what's strange with this problem is that the problem occurs whether I pass a --net argument or not. it seems like there's a builtin default somewhere that conflicts with another default... without a --net option, i end up with the following nic configuration in the remote create job:

        nics: 
          - ip: 38.229.82.23
            link: br0
            mac: 06:66:38:c4:0c:23
            mode: bridged
            network: 097c2565-dab9-4a29-9519-b987718ed812
            vlan:

what's interesting there is that the ip there is actually the one from the source cluster. in a sense, it's obviously incorrect as it does, indeed, have both a IP and a network field, as described, but it's not supplied by the operator. i wonder wth is going on here...

anarcat commented 1 year ago

@rbott

I think I remember that while looking at the code of move-instance we found out that it actually does make some assumptions on the instance configuration which might lead to the NIC-configuration-related-error you stated above.

i'd really love to hear where you found that code, because what I found was pretty generic, copying data around. i've made #1698 which seems to work as as stopgap measure here.

i do wonder if the right place to do this might not better be somewhere in here:

https://github.com/ganeti/ganeti/blob/114e59fcc9d4a7c82618569f5d6b7389a0f80123/tools/move-instance#L590-L604

i just can't figure out what to do with this stuff... it seems like it make sense to inherit it, but we're actually creating garbage here because it's where we create that dict which has both network and mode for example...

at least failing here would fail early and facilitate debugging? not sure what the best way forward is here either.

anarcat commented 1 year ago

so i have two more PRs here, #1698 and #1697 which fix the problems i've encountered so far. i'm at this error now:

2023-03-13 20:56:10,146: Move1 INFO [Mon Mar 13 20:56:10 2023]  - WARNING: export 'export-disk1-2023-03-13_20_55_58-_1zmyfcu' on chi-node-08.torproject.org failed: Exited with status 1
2023-03-13 20:56:10,146: Move1 INFO [Mon Mar 13 20:56:10 2023] Disk 1 failed to send data: Exited with status 1 (recent output: dd: 0 bytes copied, 0.998604 s, 0.0 kB/s\ndd: 0 bytes copied, 6.00403 s, 0.0 kB/s\nsocat: E SSL_connect(): Connection refused)

i think this could be related to:

make sure source Ganeti nodes are able to establish TCP connections on ports > 1024 to destination Ganeti nodes (inter-cluster-migration will always use the primary network of the cluster, not a secondary/alternate network which may be configured for e.g. DRBD stuff)

i have punched holes in the primary nodes, but not all nodes, so this might be what's crashing this...

and then I guess i'll catch up with your #1681... how did you actually work around that one?

rbott commented 1 year ago

because of the change in the instance network configuration (see above) we alter the network parameters (e.g. link=br604 turns into link=gnt-bridge,vlan=604) and also statically set the mac adress the interface had on the source cluster (otherwise a new one will be assigned during instance creation on the destination cluster) - this can be done by adding something like --net 0:link=gnt-bridge,vlan=604,mac=aa:bb:cc:dd:ee:ff to the command-line of move-instance)

so basically I need to actually allocate a MAC address for each VM I move? ouch?

Well yes and no. If you provide a --net parameter and leave out mac, it will default to the value of generate which will cause the destination Ganeti Cluster to role the dices and generate a new mac address. If that does not cause any problems for you, you can completely ignore this. But if it does cause Problems (DHCP reservations, older systems with autogenerated udev rules for ethX names etc.) you might want to retain the original mac address. In our case we simply ask RAPI on the source cluster for the current mac address(es) of the instance and pass it to the --net parameter of the move-instance command. In case of our ansible playbook it is a simple extra task. But YMMV, it might not even be required to retain the mac address(es) :-)

I think I remember that while looking at the code of move-instance we found out that it actually does make some assumptions on the instance configuration which might lead to the NIC-configuration-related-error you stated above.

i'd really love to hear where you found that code, because what I found was pretty generic, copying data around. i've made #1698 which seems to work as as stopgap measure here.

I probably should have looked at the code again before posting assumptions, sorry for that :-) But I think you have found the right spot and #1698 (along with @apoikos annotation/review) should do the trick and solve that issue.

and then I guess i'll catch up with your #1681... how did you actually work around that one?

Well, we took the short (and ugly) route and "hot-patched" this file on the sending node(s): https://github.com/ganeti/ganeti/blob/114e59fcc9d4a7c82618569f5d6b7389a0f80123/lib/impexpd/__init__.py#L91 ...to state verify=0. As we mainly used move-instance to migrate from older clusters to 3.0 clusters, we rarely ran into this problem (mostly cases where we actually moved an instance to the wrong destination cluster and had to move it between two 3.0 clusters afterwards). But nevertheless it is actually broken for everyone right now using 3.0 and it needs a proper solution.

anarcat commented 1 year ago

On 2023-03-14 04:36:44, Rudolph Bott wrote:

because of the change in the instance network configuration (see above) we alter the network parameters (e.g. link=br604 turns into link=gnt-bridge,vlan=604) and also statically set the mac adress the interface had on the source cluster (otherwise a new one will be assigned during instance creation on the destination cluster) - this can be done by adding something like --net 0:link=gnt-bridge,vlan=604,mac=aa:bb:cc:dd:ee:ff to the command-line of move-instance)

so basically I need to actually allocate a MAC address for each VM I move? ouch?

Well yes and no. If you provide a --net parameter and leave out mac, it will default to the value of generate which will cause the destination Ganeti Cluster to role the dices and generate a new mac address. If that does not cause any problems for you, you can completely ignore this. But if it does cause Problems (DHCP reservations, older systems with autogenerated udev rules for ethX names etc.) you might want to retain the original mac address. In our case we simply ask RAPI on the source cluster for the current mac address(es) of the instance and pass it to the --net parameter of the move-instance command. In case of our ansible playbook it is a simple extra task. But YMMV, it might not even be required to retain the mac address(es) :-)

What I meant is that to override the error, I need to pass a mac= setting somehow. But yeah, I think it's okay if our MACs get renumbered. We do have a per-cluster MAC prefix anyway, so it would be odd for those VMs to be different.

[...]

and then I guess i'll catch up with your #1681... how did you actually work around that one?

Well, we took the short (and ugly) route and "hot-patched" this file on the sending node(s): https://github.com/ganeti/ganeti/blob/114e59fcc9d4a7c82618569f5d6b7389a0f80123/lib/impexpd/__init__.py#L91 ...to state verify=0. As we mainly used move-instance to migrate from older clusters to 3.0 clusters, we rarely ran into this problem (mostly cases where we actually moved an instance to the wrong destination cluster and had to move it between two 3.0 clusters afterwards). But nevertheless it is actually broken for everyone right now using 3.0 and it needs a proper solution.

While we're talking about monkeypatching stuff here, I wonder if there's a cleaner way to bypass this than just disabling verification. In our case, this is flying over an untrusted network, so I actually really don't want to disable verification. I think. Maybe there's a way to hardcode the CA or something?

anarcat commented 1 year ago

okay, so i think this ticket can remain for documentation, i filed #1697 for the python 3 stuff, #1698 for the NIC stuff (which could also be improved) and #1699 for the commonname stuff.

what would remain here is documenting the heck out of all this.