LINBIT / linstor-proxmox

Integration pluging bridging LINSTOR to Proxmox VE
31 stars 7 forks source link

Auto-place fails when creating a VM #1

Closed elonen closed 6 years ago

elonen commented 6 years ago

Linstor-proxmox 2.9.0-1 (from the Debian repository) apparently fails on auto-placement. Output from Proxmox status window:

SUCCESS:
Description:
    New resource definition 'vm-106-disk-2' created.
Details:
    Resource definition 'vm-106-disk-2' UUID is: 084e388d-95e4-4758-9f57-8579d33e8012
SUCCESS:
Description:
    Resource definition 'vm-106-disk-2' modified.
Details:
    Resource definition 'vm-106-disk-2' UUID is: 084e388d-95e4-4758-9f57-8579d33e8012
SUCCESS:
    New volume definition with number '0' of resource definition 'vm-106-disk-2' created.
ERROR:
Description:
    Registration of auto-placing resource: 'vm-106-disk-2' failed due to an unknown exception.
Details:
    Auto-placing resource: vm-106-disk-2
TASK ERROR: unable to create VM 106 - error with cfs lock 'storage-drbdlinstor': Could not place vm-106-disk-2: exit code 10

This might actually be caused by a bug in the linstore-client, since manual auto-placement also seems to fail with "unknown exception" if you don't explicitly set storage pool:

# linstor resource create  --auto-place 3 locatesti

ERROR:
Description:
    Registration of auto-placing resource: 'locatesti' failed due to an unknown exception.
Details:
    Auto-placing resource: locatesti

vs.

# linstor resource create -s DfltStorPool --auto-place 3 locatesti

SUCCESS:
Description:
    Resource 'locatesti' successfully autoplaced on 3 nodes
Details:
    Used storage pool: 'DfltStorPool'
    Used nodes: 'mox-b', 'mox-a', 'mox-c'

(In any case, it would be nice if you could specify storage pool in /etc/pve/storage.cfg instead of relying on the default.)

Package versions:

ii  linstor-client                       0.6.0-1                                  all          Linstor client command line tool
ii  linstor-common                       0.6.2-1                                  all          DRBD distributed resource management utility
ii  linstor-controller                   0.6.2-1                                  all          DRBD distributed resource management utility
ii  linstor-proxmox                      2.9.0-1                                  all          DRBD distributed resource management utility
ii  linstor-satellite                    0.6.2-1                                  all          DRBD distributed resource management utility
ii  python-linstor                       0.6.0-1                                  all          Linstor python api library
ghernadi commented 6 years ago

Can you please attach the corresponding ErrorReports from /var/lib/linstor-controller/logs/ from the controller and (if any) from /var/lib/linstor-satellite/logs/ from the satellite(s)?

elonen commented 6 years ago

Looks like both Proxmox plugin and the manual attempt without specifying pool both result in a similar error ("Access to deleted volume" at Method 'checkDeleted', Source file 'VolumeData.java', Line #416):

ErrorReport-5B8EE735-000001.log ErrorReport-5B8EE735-000000.log

No errors on satellites apparently.

ghernadi commented 6 years ago

Sorry, I forgot about this issue. The error was found yesterday and I'm quite positive that the fix should work (it's a bit tricky :) ).

The fix will be included in the next release.

Here is what happens: The issue occurs whenever you try to deploy a resource (it doesn't matter if auto-place is involved or not) and that resource creation fails (for whatever reason). All of the resource's volumes will remain in a so called FreeSpaceMgr. If you see the resource creation failed and you clean that resource up (delete it), unfortunately the volumes still remain referenced by the FreeSpaceMgr. Deleting the resource also means deleting its volumes. The next time the FreeSpaceMgr is asked for the estimated size of the remote storage, it tries to access the then already deleted volume --> Exception.

Until the next release, whenever you run into such a situation a simple restart of the controller should also resolve the problem (just Linstor, not the whole machine :) )

rck commented 6 years ago

Looks like this is fixed by @ghernadi. I'm closing this issue, it was unrelated to the plugin itself anyways.