Open tsinik-dw opened 2 years ago
I should also mention that volume migration from non-managed to managed storage was functional in ACS 4.13.1, following the steps described by Mike Tutkowski in https://youtu.be/lkVMb6elvz4 (On 31:25 the actual migration is performed).
Hi @tsinik-dw For case number 1) can you try migrate volume API specifying the target storage pool UUID as a parameter instead of trying through the UI? (this seems like a UI bug) For case number 2) it is not supported
Hi @nvazquez,
I just tried the volume migration for case number 1, using cmk but it didn't work.
The cmk command and output is:
(noc-dev) π± > migrate volume storageid=2514b65e-b231-4b2e-932c-c897f2df7c79 volumeid=09243386-b5f2-4920-afc3-3505d2ee311c livemigrate=true newdiskofferingid=5b764ddb-ea60-40b1-8ff5-586953266e92
{
"accountid": "d0987ed7-8031-11ec-9ad0-ba21ccf13580",
"cmd": "org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin",
"completed": "2022-03-03T10:08:59+0200",
"created": "2022-03-03T10:07:25+0200",
"jobid": "d1d2f226-d195-4a8b-970a-1570173a1d76",
"jobprocstatus": 0,
"jobresult": {
"errorcode": 530,
"errortext": "Resource [StoragePool:2] is unreachable: Migrate volume failed: com.cloud.utils.exception.CloudRuntimeException: Migration operation failed in 'StorageSystemDataMotionStrategy.handleVolumeMigrationFromNonManagedStorageToManagedStorage': Failed to migrate volume with ID 132 to storage pool with ID 2"
},
"jobresultcode": 530,
"jobresulttype": "object",
"jobstatus": 2,
"userid": "d09ca276-8031-11ec-9ad0-ba21ccf13580"
}
π Error: async API failed for job d1d2f226-d195-4a8b-970a-1570173a1d76
I also attach the management log and storage_pool table (in CSV): nv_vol_migr_to_managed_cmk.txt storage_pool.txt
Thanks for the logs @tsinik-dw! It seems simply the pool is out of space according to the error thrown:
2022-03-03 10:08:57,148 ERROR [c.c.h.x.r.w.x.XenServer610MigrateVolumeCommandWrapper] (DirectAgent-242:ctx-0c966ba9) (logid:d1d2f226) Caught exception com.xensource.xenapi.Types$BadAsyncResult due to the following: Task failed! Task record: uuid: 0d12e02c-7f8f-6c11-603c-f04e3b8e1dc1
nameLabel: Async.VDI.pool_migrate
nameDescription:
allowedOperations: []
currentOperations: {}
created: Thu Mar 03 10:08:15 EET 2022
finished: Thu Mar 03 10:08:43 EET 2022
status: failure
residentOn: com.xensource.xenapi.Host@6634ea40
progress: 1.0
type: <none/>
result:
errorInfo: [SR_BACKEND_FAILURE_44, , There is insufficient space]
otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
subtasks: []
Task failed! Task record: uuid: 0d12e02c-7f8f-6c11-603c-f04e3b8e1dc1
nameLabel: Async.VDI.pool_migrate
nameDescription:
allowedOperations: []
currentOperations: {}
created: Thu Mar 03 10:08:15 EET 2022
finished: Thu Mar 03 10:08:43 EET 2022
status: failure
residentOn: com.xensource.xenapi.Host@6634ea40
progress: 1.0
type: <none/>
result:
errorInfo: [SR_BACKEND_FAILURE_44, , There is insufficient space]
otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
subtasks: []
Hi @nvazquez,
This error message is a little weird. After repeating the same test today with a 2GB DATA volume and digging deeper into the logs, I came across the following error found on SMlog of the pool master:
Mar 4 13:39:30 xen8-c1 SM: [5541] vdi_create {'sr_uuid': 'e71497cb-a0b7-ac0d-f836-f363811663b6', 'subtask_of': 'DummyRef:|4cb983a2-be5a-0b1b-296e-08bbb8e53a57|VDI.create', 'vdi_type': 'user', 'args': ['2147483648', 'DATA-1815', '', '', 'false', '19700101T00:00:00Z', '', 'false'], 'host_ref': 'OpaqueRef:517abf46-0e48-9ed9-ba7a-188f8260a820', 'session_ref': 'OpaqueRef:607e2408-0bc7-8a3e-0306-8693e6fc0657', 'device_config': {'target': '192.168.70.233', 'multihomelist': '192.168.70.233:3260', 'targetIQN': 'iqn.2010-01.com.solidfire:slwz.data-1815.331', 'SRmaster': 'true', 'device': '/dev/disk/mpInuse/36f47acc100000000736c777a0000014b', 'SCSIid': '36f47acc100000000736c777a0000014b'}, 'command': 'vdi_create', 'sr_ref': 'OpaqueRef:9b3d9125-9a98-ebf1-6677-944554ad2c71', 'vdi_sm_config': {'base_mirror': '07d9936d-1b5f-e16b-080e-f41a33d452d2/06c8e955-2404-4fd7-86e1-8d65fba194f0'}}
Mar 4 13:39:30 xen8-c1 SM: [5541] LVHDVDI.create for 0bb897ed-de6a-4c70-8fd3-aaaa23b4548c
Mar 4 13:39:30 xen8-c1 SM: [5541] LVHDVDI.create: type = vhd, /dev/VG_XenStorage-e71497cb-a0b7-ac0d-f836-f363811663b6/VHD-0bb897ed-de6a-4c70-8fd3-aaaa23b4548c (size=2147483648)
Mar 4 13:39:30 xen8-c1 SM: [5541] ['/sbin/vgs', '--noheadings', '--nosuffix', '--units', 'b', 'VG_XenStorage-e71497cb-a0b7-ac0d-f836-f363811663b6']
Mar 4 13:39:30 xen8-c1 SM: [5541] pread SUCCESS
Mar 4 13:39:30 xen8-c1 SM: [5541] Not enough space! free space: 167772160, need: 2160066560
Mar 4 13:39:30 xen8-c1 SM: [5541] Raising exception [44, There is insufficient space]
Mar 4 13:39:30 xen8-c1 SM: [5541] lock: released /var/lock/sm/e71497cb-a0b7-ac0d-f836-f363811663b6/sr
Mar 4 13:39:30 xen8-c1 SM: [5541] ***** generic exception: vdi_create: EXCEPTION <class 'SR.SROSError'>, There is insufficient space
The SR created on the destination pool was 2,2GB. It seems that it does not respect the hv_ss_reserve value on disk offering, so with this size, it can not create a snapshot of VDI for live migration. The hv_ss_reserve value in our case is 200.
@tsinik-dw thanks, I could validate that on the logs:
2022-03-03 10:07:25,351 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-34:ctx-95b16032 job-473 ctx-9a984c53) (logid:d1d2f226) Destination pool id: 2
2022-03-03 10:07:25,360 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-34:ctx-95b16032 job-473 ctx-9a984c53) (logid:d1d2f226) Pool ID for the volume with ID 132 is 1
2022-03-03 10:07:25,365 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-34:ctx-95b16032 job-473 ctx-9a984c53) (logid:d1d2f226) Found storage pool SOLIDFIRE of type Iscsi
2022-03-03 10:07:25,366 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-34:ctx-95b16032 job-473 ctx-9a984c53) (logid:d1d2f226) Total capacity of the pool SOLIDFIRE with ID 2 is (60.00 GB) 64424476455
2022-03-03 10:07:25,370 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-34:ctx-95b16032 job-473 ctx-9a984c53) (logid:d1d2f226) Checking pool: 2 for storage allocation , maxSize : (60.00 GB) 64424476455, totalAllocatedSize : (23.00 GB) 24696061952, askingSize : (2.20 GB) 2362232064, allocated disable threshold: 0.85
2022-03-03 10:07:25,433 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-34:ctx-95b16032 job-473 ctx-9a984c53) (logid:d1d2f226) Access granted to Acct[d0987ed7-8031-11ec-9ad0-ba21ccf13580-admin] to com.cloud.storage.DiskOfferingVO$$EnhancerByCGLIB$$39e596b2@3ec58fe5 by AffinityGroupAccessChecker
I checked the calculation for the asking size on the SolidFire provider uses the volume size and the volume hv_ss_reserve values from the volumes table. To calculate the asking size, CS adds volume size * (hv_ss_reserve / 100)
to the volume size in bytes - if in your case hv_ss_reserve = 200 then it would mean that CS will ask for 3 times the volume size, around 700MB is that the volume size? Reference: https://github.com/apache/cloudstack/blob/4.16/plugins/storage/volume/solidfire/src/main/java/org/apache/cloudstack/storage/datastore/driver/SolidFirePrimaryDataStoreDriver.java#L449
Can you please share the DB output of:
Hi @nvazquez,
here is the output of the SQL queries:
select * from volumes where id = 132;
"id","account_id","domain_id","pool_id","last_pool_id","instance_id","device_id","name","uuid","size","folder","path","pod_id","data_center_id","iscsi_name","host_ip","volume_type","pool_type","disk_offering_id","template_id","first_snapshot_backup_uuid","recreatable","created","attached","updated","removed","state","chain_info","update_count","disk_type","vm_snapshot_chain_size","iso_id","display_volume","format","min_iops","max_iops","hv_ss_reserve","provisioning_type"
132,2,1,1,,,,DATAVOL-1,"09243386-b5f2-4920-afc3-3505d2ee311c",2147483648,,"0f7024ca-e21f-4398-a236-16536f978755",,1,,,DATADISK,IscsiLUN,21,,,0,2022-03-03 07:52:32,,2022-03-03 12:32:20,2022-03-03 12:32:20,Expunged,,14,,,,1,VHD,,,0,thin
Please note, that this volume is now in Expunged
state because I tried to delete it after the unsuccessful migration and it didn't worked.
Also, the following disk offering output is currently the following but there have been manual changes to the values is several fields. select * from disk_offering where uuid = "5b764ddb-ea60-40b1-8ff5-586953266e92";
"id","name","uuid","display_text","disk_size","type","tags","recreatable","use_local_storage","unique_name","system_use","customized","removed","created","sort_key","display_offering","customized_iops","min_iops","max_iops","bytes_read_rate","bytes_read_rate_max","bytes_read_rate_max_length","bytes_write_rate","bytes_write_rate_max","bytes_write_rate_max_length","iops_read_rate","iops_read_rate_max","iops_read_rate_max_length","iops_write_rate","iops_write_rate_max","iops_write_rate_max_length","state","hv_ss_reserve","cache_mode","provisioning_type"
22,SF DO 2 (2 GB) 2222-4444,"5b764ddb-ea60-40b1-8ff5-586953266e92",SF DO 2 (2 GB) 2222-4444,2147483648,Disk,sf,0,0,,0,0,,2022-03-03 08:01:30,0,1,0,2222,4444,,,,,,,,,,,,,Active,100,none,thin
However, I did tried similar volume migrations with the following volume and disk offering records. As you can see the volume size is 2GB, and the Disk Offering is if 12GB (I tried with 7GB too).
VOLUME
"id","account_id","domain_id","pool_id","last_pool_id","instance_id","device_id","name","uuid","size","folder","path","pod_id","data_center_id","iscsi_name","host_ip","volume_type","pool_type","disk_offering_id","template_id","first_snapshot_backup_uuid","recreatable","created","attached","updated","removed","state","chain_info","update_count","disk_type","vm_snapshot_chain_size","iso_id","display_volume","format","min_iops","max_iops","hv_ss_reserve","provisioning_type"
134,2,1,1,,131,1,DATAVOL-2,feab49ab-7381-4d55-a86a-6d2c51faa8dc,2147483648,,"03394dc4-d95a-47cc-8fb3-9950e92a2b44",,1,,,DATADISK,IscsiLUN,21,,,0,2022-03-03 12:32:59,2022-03-03 12:34:34,2022-03-03 12:41:32,,Ready,,8,,,,1,VHD,,,0,thin
DISK OFFERING
"id","name","uuid","display_text","disk_size","type","tags","recreatable","use_local_storage","unique_name","system_use","customized","removed","created","sort_key","display_offering","customized_iops","min_iops","max_iops","bytes_read_rate","bytes_read_rate_max","bytes_read_rate_max_length","bytes_write_rate","bytes_write_rate_max","bytes_write_rate_max_length","iops_read_rate","iops_read_rate_max","iops_read_rate_max_length","iops_write_rate","iops_write_rate_max","iops_write_rate_max_length","state","hv_ss_reserve","cache_mode","provisioning_type"
25,SF DO 12 GB,deea8a00-3024-4ccd-ae95-0df023c4e76d,SF DO 12 GB,12884901888,Disk,sf,0,0,,0,0,,2022-03-03 12:31:32,0,1,0,1111,2222,,,,,,,,,,,,,Active,200,none,thin
Thanks @tsinik-dw, unfortunately I cannot test this, but I do see the volume has hv_ss_reserve = 0, that may be the reason why its not reserving more space. Have you tried without specifying the new disk offering on the API method? Do you get the same error? For the sake of testing, can you manually update the hv_ss_reserve value for a test volume and attempt the migration again?
Hi @nvazquez,
it worked!... I made multiple tests and got several error messages which I am going to list at the end of this report. Please excuse my gigantic post :-) but the information may be helpful.
First, the steps that led to a successful volume migration.
disk_offering table
(set min_iops, max_iops, hv_ss_reserve)volumes
table (set min_iops, max_iops, hv_ss_reserve)
At this point UI doesn't give the Solidfire storage as an available option for volume migration, so migrate volume storageid=2514b65e-b231-4b2e-932c-c897f2df7c79 volumeid=83bc0063-b73c-4581-8bd0-c7e59d34263e livemigrate=true
and the volume gets migrated giving the following output;
{
"volume": {
"account": "admin",
"created": "2022-03-09T10:14:51+0200",
"destroyed": false,
"deviceid": 0,
"diskioread": 0,
"diskiowrite": 0,
"diskkbsread": 0,
"diskkbswrite": 0,
"displayvolume": true,
"domain": "ROOT",
"domainid": "747b2d33-8031-11ec-9ad0-ba21ccf13580",
"hypervisor": "XenServer",
"id": "83bc0063-b73c-4581-8bd0-c7e59d34263e",
"isextractable": false,
"maxiops": 4000,
"miniops": 1000,
"name": "ROOT-133",
"path": "c71023ba-d338-4290-8b1c-8a314a662bc2",
"provisioningtype": "thin",
"quiescevm": false,
"serviceofferingdisplaytext": "CO 1 Desc",
"serviceofferingid": "a1c6f819-c653-406e-8a86-4b39ed4b744a",
"serviceofferingname": "CO 1",
"size": 5368709120,
"state": "Ready",
"storage": "SOLIDFIRE",
"storageid": "2514b65e-b231-4b2e-932c-c897f2df7c79",
"storagetype": "shared",
"tags": [],
"templatedisplaytext": "Centos 7",
"templateid": "7bdc9edc-3afd-42e3-a23c-150ec4b58afa",
"templatename": "Centos 7",
"type": "ROOT",
"virtualmachineid": "3e28031a-5019-4459-a19b-6fa0b73c4373",
"vmdisplayname": "VM-XCP-2",
"vmname": "VM-XCP-2",
"vmstate": "Running",
"zoneid": "97299276-7257-4b79-a1df-51cf89c402e2",
"zonename": "ZONE1"
}
}
For sake of completeness, I attach the corresponding management log entries nv_vol_migr_to_managed_cmk_success.txt
Now, some remarks and errors:
newdiskofferingid
option the migration does not work, even if the old and new disk offerings are exactly the same and with no tags.
Error message:
(noc-dev) π± > migrate volume storageid=2514b65e-b231-4b2e-932c-c897f2df7c79 volumeid=58a087e5-eab5-4588-acd2-b48d27895e8b livemigrate=true newdiskofferingid=b89a1d02-c775-470c-bd13-2a655d74cd49 { "accountid": "d0987ed7-8031-11ec-9ad0-ba21ccf13580", "cmd": "org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin", "completed": "2022-03-09T11:37:41+0200", "created": "2022-03-09T11:37:41+0200", "jobid": "3dc0fe30-8e9a-4911-8bcc-a428adda4fcf", "jobprocstatus": 0, "jobresult": { "errorcode": 431, "errortext": "The disk offering informed is not valid [id=b89a1d02-c775-470c-bd13-2a655d74cd49]." }, "jobresultcode": 530, "jobresulttype": "object", "jobstatus": 2, "userid": "d09ca276-8031-11ec-9ad0-ba21ccf13580" } π Error: async API failed for job 3dc0fe30-8e9a-4911-8bcc-a428adda4fcf
2. No matter the values and tag combinations in disk offerings, if the `newdiskofferingid` option is used the error message is the same as above
3. If the initial compute offering uses a disk offering with a tag, then I have to manually add this tag to the managed storage in `stoprage_pool_tags`. Otherwise, I got the following:
(noc-dev) π± > migrate volume storageid=2514b65e-b231-4b2e-932c-c897f2df7c79 volumeid=c925d774-1d1a-40b9-af62-ba21ad001f08 livemigrate=true { "accountid": "d0987ed7-8031-11ec-9ad0-ba21ccf13580", "cmd": "org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin", "completed": "2022-03-09T11:56:59+0200", "created": "2022-03-09T11:56:59+0200", "jobid": "b8535cbb-e996-452d-bd37-97e1057653b1", "jobprocstatus": 0, "jobresult": { "errorcode": 530, "errortext": "Migration target pool [null, tags:sf,solidfire] has no matching tags for volume [ROOT-135, uuid:c925d774-1d1a-40b9-af62-ba21ad001f08, tags:nfsPrimaryXCP]" }, "jobresultcode": 530, "jobresulttype": "object", "jobstatus": 2, "userid": "d09ca276-8031-11ec-9ad0-ba21ccf13580" } π Error: async API failed for job b8535cbb-e996-452d-bd37-97e1057653b1
I guess, in a production environment, a `proxy `compute offering must be used with an e.g. `migration` tag so as to not change the normal initial compute offering and add this same tag to the destination storage
4. Finally, after migration, I changed the Compute Offering to the correct one (with the appropriate tag) and everything went smooth.
Great @tsinik-dw - we're working on a fix for the volume hv_ss_reserve value but still not ready - once its produced would ask if you could test it. Been also checking the failures you shared when setting the newofferingid param, it seems like offering passed is removed?
Hi @nvazquez,
I would be happy to test the fix when ready.
Regarding the The disk offering informed is not valid
error message, it was hard for me to trace the root cause, but I am sure that all the service offerings that are mentioned exist in my setup. Following is the management log of one and the apilog of several such messages during my tests.
nv_mgmtlog_migr_vol_offering_not_valid.txt nv_apilog_migr_vol_offering_not_valid.txt
cc @pdion891 requires your input as it's related to xs/solidfire cc @shwstppr
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
ACS 4.16.0 1 Zone Cluster A: Two Xenserver 7.0 hosts Cluster B: Two XCP-NG 8.2 hosts
Each Cluster has each own NFS primary storage (non-managed storage) There is a Zone-wide Solidfire Storage (managed storage)
OS / ENVIRONMENT
VM-A1, on Cluster A, has 1 ROOT disk and 1 DATA disk. Both disks on NFS Primary storage VM-A2, on Cluster A, has 1 ROOT disk and 1 DATA disk. Both disks on Solifire storage VM-B1, on Cluster B, has 1 ROOT disk and 1 DATA disk. Both disks on NFS Primary storage VM-B2, on Cluster B, has 1 ROOT disk and 1 DATA disk. Both disks on Solifire storage
SUMMARY
We want to migrate VM DATA volumes between NFS primary storage (non-managed storage) and Solidfire storage (managed storage), both ways.
Trying to migrate VM-A1, VM-B1 Volumes from non-managed --> Managed (TRIED WITH VMs IN RUNNING AND STOPPED STATE, SAME RESULT)
The UI does not offer any available storage choice and we get the following message:
Trying to migrate VM-A2, VM-B2 Volumes from Managed --> Unmanaged: (TRIED WITH VMs IN RUNNING AND STOPPED STATE, SAME RESULT)
We get the following message:
It turns out that this feature is only supported on KVM
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS