LINBIT / linstor-proxmox

Integration pluging bridging LINSTOR to Proxmox VE
31 stars 7 forks source link

Proxmox live migrate from Linstor diskful node to diskless node #34

Closed sixeatseven closed 4 years ago

sixeatseven commented 4 years ago

Hello. What we have: 3 Proxmox diskful node with linstor cluster. 1 Proxmox diskless node image From lisntor side: image image As for 3 Proxmox diskful node - everything works as expected. I can create, migrate, delete VM and VM's disk. As for 1 Proxmox diskless node drbdstorage flagged as unknown and inactive image But i can create VM's disk from that node, can migrate stopped VM from this node to diskful node and from diskful node to this node. During live migration from from diskful node to diskless node an error: Oct 22 10:02:12 NODE-G8-T1 qm[62789]: start VM 101: UPID:NODE-G8-T1:0000F545:00DB8684:5F912E74:qmstart:101:root@pam: Oct 22 10:02:13 NODE-G8-T1 Satellite[1081]: 10:02:13.349 [MainWorkerPool-22] INFO LINSTOR/Satellite - SYSTEM - Resource 'vm-101-disk-1' created for node 'NODE-G8'. Oct 22 10:02:13 NODE-G8-T1 Satellite[1081]: 10:02:13.349 [MainWorkerPool-22] INFO LINSTOR/Satellite - SYSTEM - Resource 'vm-101-disk-1' created for node 'NODE-G8-T1'. Oct 22 10:02:13 NODE-G8-T1 Satellite[1081]: 10:02:13.349 [MainWorkerPool-22] INFO LINSTOR/Satellite - SYSTEM - Resource 'vm-101-disk-1' created for node 'NODE-G9'. Oct 22 10:02:13 NODE-G8-T1 Satellite[1081]: 10:02:13.350 [MainWorkerPool-22] INFO LINSTOR/Satellite - SYSTEM - Resource 'vm-101-disk-1' created for node 'NODE-G9-16Ports'. Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1: Starting worker thread (from drbdsetup [62804]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G8: Starting sender thread (from drbdsetup [62808]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9: Starting sender thread (from drbdsetup [62811]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9-16Ports: Starting sender thread (from drbdsetup [62814]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G8: conn( StandAlone -> Unconnected ) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G8: Starting receiver thread (from drbd_w_vm-101-d [62805]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G8: conn( Unconnected -> Connecting ) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9: conn( StandAlone -> Unconnected ) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9: Starting receiver thread (from drbd_w_vm-101-d [62805]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9: conn( Unconnected -> Connecting ) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9-16Ports: conn( StandAlone -> Unconnected ) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9-16Ports: Starting receiver thread (from drbd_w_vm-101-d [62805]) Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9-16Ports: conn( Unconnected -> Connecting ) Oct 22 10:02:14 NODE-G8-T1 pmxcfs[1276]: [status] notice: received log Oct 22 10:02:58 NODE-G8-T1 qm[62789]: API Return-Code: 500. Message: Could not create diskless resource vm-101-disk-1 on NODE-G8-T1, because: [{"ret_code":-9223372036836687872,"message":"Resource will be automatically flagged as drbd diskless","cause":"Used storage pool 'DfltDisklessStorPool' is diskless, but resource was not flagged drbd diskless","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185089,"message":"Successfully set property key(s): StorPoolName","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185089,"message":"New resource 'vm-101-disk-1' on node 'NODE-G8-T1' registered.","details":"Resource 'vm-101-disk-1' on node 'NODE-G8-T1' UUID is: e580db45-3cc1-4a57-87a8-1ef5724cfb66","obj_refs":{"UUID":"e580db45-3cc1-4a57-87a8-1ef5724cfb66","RscDfn":"vm-101-disk-1"}},{"ret_code":19660801,"message":"Volume with number '0' on resource 'vm-101-disk-1' on node 'NODE-G8-T1' successfully registered","details":"Volume UUID is: dfb39641-9a9e-4955-9349-e779ffdfc0b6","obj_refs":{"RscDfn":"vm-101-disk-1","VlmNr":"0","Node":"NODE-G8-T1"}},{"ret_code":20185091,"message":"Created resource 'vm-101-disk-1' on 'NODE-G8-T1'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185091,"message":"Added peer(s) 'NODE-G8-T1' to resource 'vm-101-disk-1' on 'NODE-G9-16Ports'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185091,"message":"Added peer(s) 'NODE-G8-T1' to resource 'vm-101-disk-1' on 'NODE-G9'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185091,"message":"Added peer(s) 'NODE-G8-T1' to resource 'vm-101-disk-1' on 'NODE-G8'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":-4611686018407202816,"message":"Resource did not became ready on node 'NODE-G8-T1' within reasonable time, check Satellite for errors.","obj_refs":{"RscDfn":"vm-101-disk-1","Node":"NODE-G8-T1"}}] at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 385. PVE::Storage::Custom::LINSTORPlugin::activate_volume("PVE::Storage::Custom::LINSTORPlugin", "drbdstorage", HASH(0x5613f2cd9710), "vm-101-disk-1", undef, HASH(0x5613f2d01e20)) called at /usr/share/perl5/PVE/Storage.pm line 1052 PVE::Storage::activate_volumes(HASH(0x5613f2cd8b28), ARRAY(0x5613f2cefd00)) called at /usr/share/perl5/PVE/QemuServer.pm line 5036 PVE::QemuServer::vm_start_nolock(HASH(0x5613f2cd8b28), 101, HASH(0x5613f2c86f38), HASH(0x5613f2c78898), HASH(0x5613f2c65328)) called at /usr/share/perl5/PVE/QemuServer.pm line 4869 PVE::QemuServer::__ANON__() called at /usr/share/perl5/PVE/AbstractConfig.pm line 299 PVE::AbstractConfig::__ANON__() called at /usr/share/perl5/PVE/Tools.pm line 215 eval {...} called at /usr/share/perl5/PVE/Tools.pm line 215 PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf", 10, 0, CODE(0x5613f2c65748)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 302 PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0, CODE(0x5613f2c78940)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 322 PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10, CODE(0x5613f2c78940)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 330 PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101, CODE(0x5613f2c78940)) called at /usr/share/perl5/PVE/QemuServer.pm line 4870 PVE::QemuServer::vm_start(HASH(0x5613f2cd8b28), 101, HASH(0x5613f2c78898), HASH(0x5613f2c65328)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 2225 PVE::API2::Qemu::__ANON__("UPID:NODE-G8-T1:0000F545:00DB8684:5F912E74:qmstart:101:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 610 eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 601 PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x5613f2c86a70), "qmstart", 101, "root\@pam", CODE(0x5613f2ce9e58)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 2229 PVE::API2::Qemu::__ANON__(HASH(0x5613f2cc1c70)) called at /usr/share/perl5/PVE/RESTHandler.pm line 453 PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x5613f28c2cf0), HASH(0x5613f2cc1c70)) called at /usr/share/perl5/PVE/RESTHandler.pm line 865 eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 848 PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start", "vm_start", ARRAY(0x5613ed797a50), ARRAY(0x5613f2ceb690), HASH(0x5613f2ceb6d8), CODE(0x5613f2cc0e00), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 591 PVE::CLIHandler::__ANON__(ARRAY(0x5613ed797c78), undef, CODE(0x5613f2cc0e00)) called at /usr/share/perl5/PVE/CLIHandler.pm line 668 PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at /usr/sbin/qm line 8 Oct 22 10:02:58 NODE-G8-T1 qm[62788]: <root@pam> end task UPID:NODE-G8-T1:0000F545:00DB8684:5F912E74:qmstart:101:root@pam: API Return-Code: 500. Message: Could not create diskless resource vm-101-disk-1 on NODE-G8-T1, because: [{"ret_code":-9223372036836687872,"message":"Resource will be automatically flagged as drbd diskless","cause":"Used storage pool 'DfltDisklessStorPool' is diskless, but resource was not flagged drbd diskless","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185089,"message":"Successfully set property key(s): StorPoolName","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185089,"message":"New resource 'vm-101-disk-1' on node 'NODE-G8-T1' registered.","details":"Resource 'vm-101-disk-1' on node 'NODE-G8-T1' UUID is: e580db45-3cc1-4a57-87a8-1ef5724cfb66","obj_refs":{"UUID":"e580db45-3cc1-4a57-87a8-1ef5724cfb66","RscDfn":"vm-101-disk-1"}},{"ret_code":19660801,"message":"Volume with number '0' on resource 'vm-101-disk-1' on node 'NODE-G8-T1' successfully registered","details":"Volume UUID is: dfb39641-9a9e-4955-9349-e779ffdfc0b6","obj_refs":{"RscDfn":"vm-101-disk-1","VlmNr":"0","Node":"NODE-G8-T1"}},{"ret_code":20185091,"message":"Created resource 'vm-101-disk-1' on 'NODE-G8-T1'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185091,"message":"Added peer(s) 'NODE-G8-T1' to resource 'vm-101-disk-1' on 'NODE-G9-16Ports'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185091,"message":"Added peer(s) 'NODE-G8-T1' to resource 'vm-101-disk-1' on 'NODE-G9'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":20185091,"message":"Added peer(s) 'NODE-G8-T1' to resource 'vm-101-disk-1' on 'NODE-G8'","obj_refs":{"RscDfn":"vm-101-disk-1"}},{"ret_code":-4611686018407202816,"message":"Resource did not became ready on node 'NODE-G8-T1' within reasonable time, check Satellite for errors.","obj_refs":{"RscDfn":"vm-101-disk-1","Node":"NODE-G8-T1"}}] at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 385. PVE::Storage::Custom::LINSTORPlugin::activate_volume("PVE::Storage::Custom::LINSTORPlugin", "drbdstorage", HASH(0x5613f2cd9710), "vm-101-disk-1", undef, HASH(0x5613f2d01e20)) called at /usr/share/perl5/PVE/Storage.pm line 1052 PVE::Storage::activate_volumes(HASH(0x5613f2cd8b28), ARRAY(0x5613f2cefd00)) called at /usr/share/perl5/PVE/QemuServer.pm line 5036 PVE::QemuServer::vm_start_nolock(HASH(0x5613f2cd8b28), 101, HASH(0x5613f2c86f38), HASH(0x5613f2c78898), HASH(0x5613f2c65328)) called at /usr/share/perl5/PVE/QemuServer.pm line 4869 PVE::QemuServer::__ANON__() called at /usr/share/perl5/PVE/AbstractConfig.pm line 299 PVE::AbstractConfig::__ANON__() called at /usr/share/perl5/PVE/Tools.pm line 215 eval {...} called at /usr/share/perl5/PVE/Tools.pm line 215 PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf", 10, 0, CODE(0x5613f2c65748)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 302 PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0, CODE(0x5613f2c78940)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 322 PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10, CODE(0x5613f2c78940)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 330 PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101, CODE(0x5613f2c78940)) called at /usr/share/perl5/PVE/QemuServer.pm line 4870 PVE::QemuServer::vm_start(HASH(0x5613f2cd8b28), 101, HASH(0x5613f2c78898), HASH(0x5613f2c65328)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 2225 PVE::API2::Qemu::__ANON__("UPID:NODE-G8-T1:0000F545:00DB8684:5F912E74:qmstart:101:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 610 eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 601 PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x5613f2c86a70), "qmstart", 101, "root\@pam", CODE(0x5613f2ce9e58)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 2229 PVE::API2::Qemu::__ANON__(HASH(0x5613f2cc1c70)) called at /usr/share/perl5/PVE/RESTHandler.pm line 453 PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x5613f28c2cf0), HASH(0x5613f2cc1c70)) called at /usr/share/perl5/PVE/RESTHandler.pm line 865 eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 848 PVE::RESTHandl Oct 22 10:02:59 NODE-G8-T1 sshd[62782]: Received disconnect from 10.10.211.11 port 38208:11: disconnected by user

So the question is - joint work and live migration from diskful node to diskful node and back not supported in functionality or am I missing something in the config? Thank you!

rck commented 4 years ago

/etc/pve/storage.cfg would also be interesting. I already have an idea, but as I'm currently on vacation, this has to wait till around mid next week.

sixeatseven commented 4 years ago

Sure: root@NODE-G9:~# cat /etc/pve/storage.cfg

dir: local
        path /var/lib/vz
        content backup,iso,vztmpl

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: NFSVMSTORAGE
        export /export/NFSVMSTORAGE
        path /mnt/pve/NFSVMSTORAGE
        server 10.10.10.37
        content snippets,vztmpl,rootdir,iso,images,backup
        maxfiles 2

drbd: drbdstorage
   content images,rootdir
   controller 10.10.211.20
   resourcegroup linstore_storage
rck commented 4 years ago

Looks like there is some noise that is actually fine, but then we only see: Oct 22 10:02:13 NODE-G8-T1 kernel: drbd vm-101-disk-1 NODE-G9-16Ports: conn( Unconnected -> Connecting ) without an actual "connected" and then:

"Resource did not became ready on node 'NODE-G8-T1' within reasonable time, check Satellite for errors."

So:

sixeatseven commented 4 years ago

Your tips gave me the idea that we have not added separate migration network for drbd on diskless node.

linstor node interface create <NODE-NAME> data <ip>
linstor storage-pool set-property <NODE-NAME> drbdpool PrefNic data

Once the configurations are matched on diskless and diskfull nodes, everything works as expected. Proxmox on diskless node show drbdstorage flagged as unknown and inactive, but live migration works and other features work as expected. Thank you!