LINBIT / linstor-proxmox

Integration pluging bridging LINSTOR to Proxmox VE
31 stars 7 forks source link

'storage-thin-pool'-locked command timed out - aborting #52

Closed cr1ll3 closed 2 years ago

cr1ll3 commented 2 years ago

Sometimes, when attempting to clone an image I get the following error message;

TASK ERROR: clone failed: API Return-Code: 500. Message: Could not autoplace resource vm-102-disk-1, because: 'storage-thin-pool'-locked command timed out - aborting   at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 327.   PVE::Storage::Custom::LINSTORPlugin::alloc_image("PVE::Storage::Custom::LINSTORPlugin", "thin-pool", HASH(0x55db01f38c40), 102, "raw", undef, 104857604) called at /usr/share/perl5/PVE/Storage.pm line 979     eval {...} called at /usr/share/perl5/PVE/Storage.pm line 979   PVE::Storage::__ANON__() called at /usr/share/perl5/PVE/Cluster.pm line 617     eval {...} called at /usr/share/perl5/PVE/Cluster.pm line 583   PVE::Cluster::__ANON__("storage-thin-pool", undef, CODE(0x55db01917700)) called at /usr/share/perl5/PVE/Cluster.pm line 662     PVE::Cluster::cfs_lock_storage("thin-pool", undef, CODE(0x55db01917700)) called at /usr/share/perl5/PVE/Storage/Plugin.pm line 546  PVE::Storage::Plugin::cluster_lock_storage("PVE::Storage::Custom::LINSTORPlugin", "thin-pool", 1, undef, CODE(0x55db01917700)) called at /usr/share/perl5/PVE/Storage.pm line 984   PVE::Storage::vdisk_alloc(HASH(0x55db01f66810), "thin-pool", 102, "raw", undef, 104857604) called at /usr/share/perl5/PVE/LXC.pm line 1987  eval {...} called at /usr/share/perl5/PVE/LXC.pm line 1973  PVE::LXC::alloc_disk(HASH(0x55db01f66810), 102, "thin-pool", 104857604, 100000, 100000) called at /usr/share/perl5/PVE/LXC.pm line 2573     eval {...} called at /usr/share/perl5/PVE/LXC.pm line 2569  PVE::LXC::copy_volume(HASH(0x55db01f582c8), 102, "thin-pool", HASH(0x55db01f66810), HASH(0x55db01f76b30), "abc", undef) called at /usr/share/perl5/PVE/API2/LXC.pm line 1616    eval {...} called at /usr/share/perl5/PVE/API2/LXC.pm line 1597     PVE::API2::LXC::__ANON__("UPID:pve2-3:001D5F13:00E21B92:62BEF973:vzclone:100:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 614    eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 605   PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x55db01ecb298), "vzclone", 100, "root\@pam", CODE(0x55dafb664ad0)) called at /usr/share/perl5/PVE/API2/LXC.pm line 1687     PVE::API2::LXC::__ANON__(HASH(0x55db01f66978)) called at /usr/share/perl5/PVE/RESTHandler.pm line 451   PVE::RESTHandler::handle("PVE::API2::LXC", HASH(0x55db003d53c0), HASH(0x55db01f66978)) called at /usr/share/perl5/PVE/HTTPServer.pm line 180    eval {...} called at /usr/share/perl5/PVE/HTTPServer.pm line 141    PVE::HTTPServer::rest_handler(PVE::HTTPServer=HASH(0x55db01ecb2c8), "::ffff:10.55.241.5", "POST", "/nodes/pve2-3/lxc/100/clone", HASH(0x55db01f63ba0), HASH(0x55db01f66ba0), "extjs") called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 888     eval {...} called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 862    PVE::APIServer::AnyEvent::handle_api2_request(PVE::HTTPServer=HASH(0x55db01ecb2c8), HASH(0x55db01f285a0), HASH(0x55db01f63ba0), "POST", "/api2/extjs/nodes/pve2-3/lxc/100/clone") called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1112    eval {...} called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1104   PVE::APIServer::AnyEvent::handle_request(PVE::HTTPServer=HASH(0x55db01ecb2c8), HASH(0x55db01f285a0), HASH(0x55db01f63ba0), "POST", "/api2/extjs/nodes/pve2-3/lxc/100/clone") called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1514     PVE::APIServer::AnyEvent::__ANON__(AnyEvent::Handle=HASH(0x55db01f169d0), "target=pve2-3&snapname=abc&newid=102&full=1") called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 1505    AnyEvent::Handle::__ANON__(AnyEvent::Handle=HASH(0x55db01f169d0)) called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 1315   AnyEvent::Handle::_drain_rbuf(AnyEvent::Handle=HASH(0x55db01f169d0)) called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 2015    AnyEvent::Handle::__ANON__() called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Loop.pm line 248   AnyEvent::Loop::one_event() called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Impl/Perl.pm line 46    AnyEvent::CondVar::Base::_wait(AnyEvent::CondVar=HASH(0x55db01642f70)) called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent.pm line 2034     AnyEvent::CondVar::Base::recv(AnyEvent::CondVar=HASH(0x55db01642f70)) called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1828    PVE::APIServer::AnyEvent::run(PVE::HTTPServer=HASH(0x55db01ecb2c8)) called at /usr/share/perl5/PVE/Service/pvedaemon.pm line 52     PVE::Service::pvedaemon::run(PVE::Service::pvedaemon=HASH(0x55db01ec3f18)) called at /usr/share/perl5/PVE/Daemon.pm line 171    eval {...} called at /usr/share/perl5/PVE/Daemon.pm line 171    PVE::Daemon::__ANON__(PVE::Service::pvedaemon=HASH(0x55db01ec3f18)) called at /usr/share/perl5/PVE/Daemon.pm line 390   eval {...} called at /usr/share/perl5/PVE/Daemon.pm line 379    PVE::Daemon::__ANON__(PVE::Service::pvedaemon=HASH(0x55db01ec3f18), undef) called at /usr/share/perl5/PVE/Daemon.pm line 551    eval {...} called at /usr/share/perl5/PVE/Daemon.pm line 549    PVE::Daemon::start(PVE::Service::pvedaemon=HASH(0x55db01ec3f18), undef) called at /usr/share/perl5/PVE/Daemon.pm line 659   PVE::Daemon::__ANON__(HASH(0x55dafabdd978)) called at /usr/share/perl5/PVE/RESTHandler.pm line 451  PVE::RESTHandler::handle("PVE::Service::pvedaemon", HASH(0x55db01ec4260), HASH(0x55dafabdd978), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line 866   eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 849   PVE::RESTHandler::cli_handler("PVE::Service::pvedaemon", "pvedaemon start", "start", ARRAY(0x55dafac0a680), ARRAY(0x55db01ec4908), undef, undef, undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 591   PVE::CLIHandler::__ANON__(ARRAY(0x55dafabe6120), CODE(0x55dafaff61a0), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 668     PVE::CLIHandler::run_cli_handler("PVE::Service::pvedaemon", "prepare", CODE(0x55dafaff61a0)) called at /usr/bin/pvedaemon line 27

I'm not sure if this is necessarily related to Linstor, but I haven't seen this error with any other storage with proxmox before.

This also leads to volumes being left behind, to cause name collisions and whatnot on the next clone-attempt.

TASK ERROR: clone failed: API Return-Code: 500. Message: Could not create resource definition snap_vm-100-disk-1_abc, because: [{"ret_code":-4611686018406940171,"message":"The creation of a new resource definition failed due to a name collision","cause":"A resource definition with the name 'snap_vm-100-disk-1_abc' already exists\nThe ResourceDefinition already exists","details":"Resource definition: snap_vm-100-disk-1_abc","obj_refs":{"RscDfn":"snap_vm-100-disk-1_abc"}}]  at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 426.  PVE::Storage::Custom::LINSTORPlugin::activate_volume("PVE::Storage::Custom::LINSTORPlugin", "thin-pool", HASH(0x55db01f8d020), "vm-100-disk-1", "abc", HASH(0x55db01ee3e48)) called at /usr/share/perl5/PVE/Storage.pm line 1204    PVE::Storage::activate_volumes(HASH(0x55db01f66a90), ARRAY(0x55db01f66970), "abc") called at /usr/share/perl5/PVE/API2/LXC.pm line 1603     eval {...} called at /usr/share/perl5/PVE/API2/LXC.pm line 1597     PVE::API2::LXC::__ANON__("UPID:pve2-3:001D76C1:00E246DB:62BEF9E2:vzclone:100:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 614    eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 605   PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x55db01ecb250), "vzclone", 100, "root\@pam", CODE(0x55db01f76548)) called at /usr/share/perl5/PVE/API2/LXC.pm line 1687     PVE::API2::LXC::__ANON__(HASH(0x55db01f386a0)) called at /usr/share/perl5/PVE/RESTHandler.pm line 451   PVE::RESTHandler::handle("PVE::API2::LXC", HASH(0x55db003d53c0), HASH(0x55db01f386a0)) called at /usr/share/perl5/PVE/HTTPServer.pm line 180    eval {...} called at /usr/share/perl5/PVE/HTTPServer.pm line 141    PVE::HTTPServer::rest_handler(PVE::HTTPServer=HASH(0x55db01ecb2c8), "::ffff:10.55.241.5", "POST", "/nodes/pve2-3/lxc/100/clone", HASH(0x55db016685e0), HASH(0x55db01f637f0), "extjs") called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 888     eval {...} called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 862    PVE::APIServer::AnyEvent::handle_api2_request(PVE::HTTPServer=HASH(0x55db01ecb2c8), HASH(0x55db01f4d638), HASH(0x55db016685e0), "POST", "/api2/extjs/nodes/pve2-3/lxc/100/clone") called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1112    eval {...} called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1104   PVE::APIServer::AnyEvent::handle_request(PVE::HTTPServer=HASH(0x55db01ecb2c8), HASH(0x55db01f4d638), HASH(0x55db016685e0), "POST", "/api2/extjs/nodes/pve2-3/lxc/100/clone") called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1514     PVE::APIServer::AnyEvent::__ANON__(AnyEvent::Handle=HASH(0x55db01f8d5c0), "newid=102&snapname=abc&full=1&target=pve2-3") called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 1505    AnyEvent::Handle::__ANON__(AnyEvent::Handle=HASH(0x55db01f8d5c0)) called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 1315   AnyEvent::Handle::_drain_rbuf(AnyEvent::Handle=HASH(0x55db01f8d5c0)) called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 2015    AnyEvent::Handle::__ANON__() called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Loop.pm line 248   AnyEvent::Loop::one_event() called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Impl/Perl.pm line 46    AnyEvent::CondVar::Base::_wait(AnyEvent::CondVar=HASH(0x55db01642f70)) called at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent.pm line 2034     AnyEvent::CondVar::Base::recv(AnyEvent::CondVar=HASH(0x55db01642f70)) called at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1828    PVE::APIServer::AnyEvent::run(PVE::HTTPServer=HASH(0x55db01ecb2c8)) called at /usr/share/perl5/PVE/Service/pvedaemon.pm line 52     PVE::Service::pvedaemon::run(PVE::Service::pvedaemon=HASH(0x55db01ec3f18)) called at /usr/share/perl5/PVE/Daemon.pm line 171    eval {...} called at /usr/share/perl5/PVE/Daemon.pm line 171    PVE::Daemon::__ANON__(PVE::Service::pvedaemon=HASH(0x55db01ec3f18)) called at /usr/share/perl5/PVE/Daemon.pm line 390   eval {...} called at /usr/share/perl5/PVE/Daemon.pm line 379    PVE::Daemon::__ANON__(PVE::Service::pvedaemon=HASH(0x55db01ec3f18), undef) called at /usr/share/perl5/PVE/Daemon.pm line 551    eval {...} called at /usr/share/perl5/PVE/Daemon.pm line 549    PVE::Daemon::start(PVE::Service::pvedaemon=HASH(0x55db01ec3f18), undef) called at /usr/share/perl5/PVE/Daemon.pm line 659   PVE::Daemon::__ANON__(HASH(0x55dafabdd978)) called at /usr/share/perl5/PVE/RESTHandler.pm line 451  PVE::RESTHandler::handle("PVE::Service::pvedaemon", HASH(0x55db01ec4260), HASH(0x55dafabdd978), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line 866   eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 849   PVE::RESTHandler::cli_handler("PVE::Service::pvedaemon", "pvedaemon start", "start", ARRAY(0x55dafac0a680), ARRAY(0x55db01ec4908), undef, undef, undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 591   PVE::CLIHandler::__ANON__(ARRAY(0x55dafabe6120), CODE(0x55dafaff61a0), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 668     PVE::CLIHandler::run_cli_handler("PVE::Service::pvedaemon", "prepare", CODE(0x55dafaff61a0)) called at /usr/bin/pvedaemon line 27
rck commented 2 years ago

hm, that code is executed in the plugin in alloc_image(), where LINSTOR actually creates the DRBD resource. Looking back in the trace PVE takes a lock and gives the command a fixed amount of 60 seconds to return. There is not that much that happens in that time: creating the DRBD backing devices, writing the *.res files, starting initial sync.

So I would say you have a pretty slow LINSTOR cluster and/or cluster in general and/or storage pools and/or $anything that makes that operation that slow. Usually these 60 seconds are ways more than LINSTOR needs to create the DRBD resource. Otherwise we would have a lot more complaints. Unfortunately I think this is a fixed timeout (Cluster.pm:cfs_lock()) not under our control.