LINBIT / linstor-proxmox

Integration pluging bridging LINSTOR to Proxmox VE
31 stars 7 forks source link

Timeouts after adding more storage pools #50

Closed theoratkin closed 2 years ago

theoratkin commented 2 years ago

We have a 3-node Linstor cluster. Currently it has a storage pool pool_fast (it's on SSDs).

We want to add 2 new storage pools pool_raid and pool_big, which are RAID0 HDDs with larger capacity.

All pools are of type lvmthin.

For some reason, when creating those 2 extra storage pools, all volume creation with Linstor slows down, regardless of which storage pool the volume is being created on; even creating on pool_fast becomes slower. When creating a 10G volume directly using Linstor CLI it goes approximately from 5 seconds to 15. Which is fine by itself, 10 extra seconds is not such a big deal.

But when trying to do the same with Proxmox - it slows down significantly, from ~20 seconds to 50-60 seconds. And even worse: most of the time it errors out with the following text:

root@ap1.office:~# time pvesm alloc fast 182 vm-182-disk-1 10G

NOTICE
  Trying to create diskful resource (vm-182-disk-1) on (ap1).
API Return-Code: 500. Message: Could not set allow-two-primaries on resource definition vm-182-disk-1, because:
'storage-fast'-locked command timed out - aborting

 at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 364.
    PVE::Storage::Custom::LINSTORPlugin::alloc_image("PVE::Storage::Custom::LINSTORPlugin", "fast", HASH(0x55eaad9a3ed0), 182, "raw", "vm-182-disk-1", 10485760) called at /usr/share/perl5/PVE/Storage.pm line 978
    eval {...} called at /usr/share/perl5/PVE/Storage.pm line 978
    PVE::Storage::__ANON__() called at /usr/share/perl5/PVE/Cluster.pm line 617
    eval {...} called at /usr/share/perl5/PVE/Cluster.pm line 583
    PVE::Cluster::__ANON__("storage-fast", undef, CODE(0x55eaad9a4620)) called at /usr/share/perl5/PVE/Cluster.pm line 662
    PVE::Cluster::cfs_lock_storage("fast", undef, CODE(0x55eaad9a4620)) called at /usr/share/perl5/PVE/Storage/Plugin.pm line 545
    PVE::Storage::Plugin::cluster_lock_storage("PVE::Storage::Custom::LINSTORPlugin", "fast", 1, undef, CODE(0x55eaad9a4620)) called at /usr/share/perl5/PVE/Storage.pm line 983
    PVE::Storage::vdisk_alloc(HASH(0x55eaad99ddd8), "fast", 182, undef, "vm-182-disk-1", 10485760) called at /usr/share/perl5/PVE/API2/Storage/Content.pm line 225
    PVE::API2::Storage::Content::__ANON__(HASH(0x55eaad973150)) called at /usr/share/perl5/PVE/RESTHandler.pm line 451
    PVE::RESTHandler::handle("PVE::API2::Storage::Content", HASH(0x55eaad6277c0), HASH(0x55eaad973150), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line 866
    eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 849
    PVE::RESTHandler::cli_handler("PVE::API2::Storage::Content", "pvesm alloc", "create", ARRAY(0x55eaa9bcdf08), ARRAY(0x55eaad997058), HASH(0x55eaad9970e8), CODE(0x55eaad8fc1e0), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 591
    PVE::CLIHandler::__ANON__(ARRAY(0x55eaa9bd6950), undef, CODE(0x55eaad8fc1e0)) called at /usr/share/perl5/PVE/CLIHandler.pm line 668
    PVE::CLIHandler::run_cli_handler("PVE::CLI::pvesm") called at /usr/sbin/pvesm line 8

real    1m0.508s
user    0m0.498s
sys 0m0.049s

Is there any way to track down those slow downs? Or at least increase the timeout, currently it seems to be at 1 minute, which is apparently too little.

rck commented 2 years ago

sounds a lot like this one: https://github.com/LINBIT/linstor-proxmox/commit/90c638dfd95cfecf4bc926e7e662c6a47d1f0990

can you please enable the cache and make sure it is set on all (otherwise it does not work) of the drbd ones in your storage.cfg. You should then see /var/cache/linstor-proxmox/pools beeing created, otherwise you have to reload some proxmox services/reboot.

theoratkin commented 2 years ago

This is it, thank you so much! Setting statuscache fixed the problem (I set it to 60 seconds). Closing the issue.

Maybe it makes sense to enable this setting by default? I don't see any downsides in doing so, not unless Proxmox is going to change its behavior any time soon.

rck commented 2 years ago

This was a "quick fix" for a customer back then and as it changes behavior, I did hide it behind an extra option.

My hope in the - hm - medium run is that maybe LINSTOR get's more efficient and implements such a cache itself. Then the plugins would not need to implement them.