apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.97k stars 1.09k forks source link

VMware: System VMs can't be started when no secondary storage with enough capacity + wrong messages around secondary storage #8390

Closed andrijapanicsb closed 2 months ago

andrijapanicsb commented 8 months ago

An environment with 2 Secondary Storages - both over 90% full (global setting set's the treshold to 0.9)

While ACS tries to start a VM (SSVM in this case) - it fails to start it with the following log messages - note that it also says that Secondary Storage is not of NFS type, then it looks for staging (assumes Object based Secondary Storage, obviously), which is not true.

Note the Secondary Storages are NOT deleted, ACS reports something like that, due to what seems to be yet another bug in the code, or an awful logging message at least.

  1. There should be ZERO dependency during staring a VM on the space available on Secondary Storage(s) - zero data is written to it, this is design bug. Even worst, if SSVMs are destroyed, ACS can't start them due to no SS with enough space, so even if you have increased the Secondary Storage space on the backedn storage, it's can't be read/updated inside ACS while SSVMs are down/not existing.
  2. Image store message needs to be corrected (there are no object-based SS in this env) - "Secondary storage is not NFS, we need to use staging storage" and related messages
  3. "failed to get message for exception: secondary storage for dc 5 is not ready yet?" should be properly rewriten to send a clear message that there are no NFS-based secondary storage with enough capacity, under 90% full by default, etc
2023-12-20 18:39:01,303 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Look for disk device info from volume : ROOT-511897_2.vmdk with base name: ROOT-511897_2
2023-12-20 18:39:01,303 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Test against disk device, controller key: 1000, unit number: 0
2023-12-20 18:39:01,303 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Test against disk backing : [lon8_EBS_PRM01] r-511897-VM/ROOT-511897_2.vmdk
2023-12-20 18:39:01,303 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Disk backing : [lon8_EBS_PRM01] r-511897-VM/ROOT-511897_2.vmdk matches ==> scsi0:0
2023-12-20 18:39:01,310 DEBUG [c.c.h.v.m.DatastoreMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Search file ROOT-511897_2.vmdk on [lon8_EBS_PRM01] r-511897-VM
2023-12-20 18:39:01,338 DEBUG [c.c.h.v.m.DatastoreMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) File found = ROOT-511897_2.vmdk, size=(1.76 GB) 1891631104
2023-12-20 18:39:01,343 WARN  [c.c.h.v.r.VmwareResource] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) hotadd of memory is not supported, dynamic scaling feature can not be applied to vm: s-512269-VM
2023-12-20 18:39:01,418 WARN  [c.c.h.v.r.VmwareResource] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) hotadd of cpu is not supported, dynamic scaling feature can not be applied to vm: s-512269-VM
2023-12-20 18:39:01,430 DEBUG [c.c.s.StatsCollector] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) Verifying image storage [38]. Capacity: total=[50 TB], used=[46 TB], threshold=[89.99999761581421%].
2023-12-20 18:39:01,430 WARN  [c.c.s.StatsCollector] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) Image storage [38] has not enough capacity. Capacity: total=[50 TB], used=[46 TB], threshold=[89.99999761581421%].
2023-12-20 18:39:01,431 DEBUG [c.c.s.StatsCollector] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) Verifying image storage [28]. Capacity: total=[90 TB], used=[82 TB], threshold=[89.99999761581421%].
2023-12-20 18:39:01,431 WARN  [c.c.s.StatsCollector] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) Image storage [28] has not enough capacity. Capacity: total=[90 TB], used=[82 TB], threshold=[89.99999761581421%].
2023-12-20 18:39:01,431 INFO  [c.c.h.v.m.VmwareManagerImpl] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) Secondary storage is not NFS, we need to use staging storage
2023-12-20 18:39:01,432 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Look for disk device info from volume : ROOT-512018_2.vmdk with base name: ROOT-512018_2
2023-12-20 18:39:01,432 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Test against disk device, controller key: 1000, unit number: 0
2023-12-20 18:39:01,432 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Test against disk backing : [lon8_EBS_PRM01] r-512018-VM/ROOT-512018_2.vmdk
2023-12-20 18:39:01,432 INFO  [c.c.h.v.m.VirtualMachineMO] (DirectAgent-134:ctx-16031b08 vdc-lon8-esx02.esx.lon8.my.domain, cmd: GetVolumeStatsCommand) (logid:5e70aef9) Disk backing : [lon8_EBS_PRM01] r-512018-VM/ROOT-512018_2.vmdk matches ==> scsi0:0
2023-12-20 18:39:01,433 WARN  [c.c.h.v.m.VmwareManagerImpl] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) No staging storage is found when non-NFS secondary storage is used
2023-12-20 18:39:01,433 INFO  [c.c.h.v.u.VmwareHelper] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) [ignored]failed to get message for exception: secondary storage for dc 5 is not ready yet?
2023-12-20 18:39:01,433 ERROR [c.c.h.v.r.VmwareResource] (DirectAgent-448:ctx-a4188b3c vdc-ams24-esx08.esx.ams24.my.domain, job-13161936/job-13236627, cmd: StartCommand) (logid:3b93f48e) StartCommand failed due to [Exception: java.lang.Exception
Message: secondary storage for dc 5 is not ready yet?
].
java.lang.Exception: secondary storage for dc 5 is not ready yet?
    at com.cloud.hypervisor.vmware.resource.VmwareResource.execute(VmwareResource.java:2204)
    at com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:560)
    at com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
2023-12-20 18:39:01,434 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-448:ctx-a4188b3c) (logid:3b93f48e) Seq 1006517-5393342028752906404: Cancelling because one of the answers is false and it is stop on error.
2023-12-20 18:39:01,434 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-448:ctx-a4188b3c) (logid:3b93f48e) Seq 1006517-5393342028752906404: Response Received: 
2023-12-20 18:39:01,437 DEBUG [c.c.a.t.Request] (DirectAgent-448:ctx-a4188b3c) (logid:3b93f48e) Seq 1006517-5393342028752906404: Processing:  { Ans: , MgmtId: 345049054097, via: 1006517(vdc-ams24-esx08.esx.ams24.my.domain), Ver: v1, Flags: 110, [{"com.cloud.agent.api.StartAnswer":{"vm":{"id":"512269","name":"s-512269-VM","bootloader":"HVM","state":"Starting","type":"SecondaryStorageVm","cpus":"4","minSpeed":"200","maxSpeed":"2000","minRam":"(5.00 GB) 5368709120","maxRam":"(10.00 GB) 10737418240","hostName":"s-512269-VM","arch":"x86_64","os":"Other Linux (64-bit)","platformEmulator":"otherLinux64Guest","bootArgs":"","enableHA":"false","limitCpuUse":"false","enableDynamicallyScaleVm":"true","vncPassword":"mTeCdwMKZC2HYcUmlH7hhg","params":{"cpuOvercommitRatio":"10.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"Vmxnet3"},"uuid":"13d2510f-e99c-4143-8480-4c8dd05f67d5","enterHardwareSetup":"false","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"d1dc2b7b-b010-4d4b-baca-b35a791874ca","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5e5c92bb-6db4-3b10-82a1-af460dcc942a","name":"ams24_EBS_PRM03","id":"1760","poolType":"PreSetup","host":"inth1-vdc-a2cpvsp1.amskoo2.my.domain","path":"/amskoo2/ams24_EBS_PRM03","port":"0","url":"PreSetup://inth1-vdc-a2cpvsp1.amskoo2.my.domain/amskoo2/ams24_EBS_PRM03/?ROLE=Primary&STOREUUID=5e5c92bb-6db4-3b10-82a1-af460dcc942a","isManaged":"false","parentPoolType":"DatastoreCluster"}},"name":"ROOT-512269","size":"(4.88 GB) 5242880000","path":"ROOT-512269_2","volumeId":"602841","vmName":"s-512269-VM","accountId":"1","format":"OVA","provisioningType":"THIN","poolId":"1760","id":"602841","deviceId":"0","bytesReadRate":"(0 bytes) 0","bytesWriteRate":"(0 bytes) 0","iopsReadRate":"(0 bytes) 0","iopsWriteRate":"(0 bytes) 0","cacheMode":"NONE","hypervisorType":"VMware","directDownload":"false","deployAsIs":"false"}},"diskSeq":"0","path":"ROOT-512269_2","type":"ROOT","_details":{"storageHost":"inth1-vdc-a2cpvsp1.amskoo2.my.domain","managed":"false","storagePort":"0","protocoltype":"DatastoreCluster","storage.pool.disk.wait":"60","volumeSize":"(4.88 GB) 5242880000"}}],"nics":[{"deviceId":"2","networkRateMbps":"-1","defaultNic":"true","pxeDisable":"true","nicUuid":"be0de784-8b6e-4e12-8c58-3161b2f4ab81","details":{"PromiscuousMode":"false","MacAddressChanges":"true","ForgedTransmits":"true","MacLearning":"false"},"dpdkEnabled":"false","uuid":"40b7c3ca-27c6-4a35-b2fc-938fb6ce9b48","ip":"212.23.60.136","netmask":"255.255.255.128","gateway":"212.23.60.254","mac":"1e:00:07:00:02:cf","dns1":"154.15.249.130","dns2":"154.15.249.134","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://400","isolationUri":"vlan://400","isSecurityGroupEnabled":"false","name":"dvS-ext,,vmwaredvs"},{"deviceId":"0","networkRateMbps":"-1","defaultNic":"false","pxeDisable":"true","nicUuid":"5422d7f2-04b3-429b-afa6-5ebf7fb66243","details":{"PromiscuousMode":"false","MacAddressChanges":"true","ForgedTransmits":"true","MacLearning":"false"},"dpdkEnabled":"false","uuid":"440ced28-cdbc-4163-93d2-5c876da7a8ba","mac":"02:00:3c:5f:44:e8","broadcastType":"LinkLocal","type":"Control","isSecurityGroupEnabled":"false","name":"dvS-ext,11,vmwaredvs"},{"deviceId":"1","networkRateMbps":"-1","defaultNic":"false","pxeDisable":"true","nicUuid":"718dcb7a-f42d-4208-a8c3-711d5f31fc67","details":{"PromiscuousMode":"false","MacAddressChanges":"true","ForgedTransmits":"true","MacLearning":"false"},"dpdkEnabled":"false","uuid":"3624e70c-b25a-40e2-9e6b-ccdca6b18d0e","ip":"10.208.96.197","netmask":"255.255.252.0","gateway":"10.208.99.254","mac":"1e:00:6a:00:03:c0","broadcastType":"Native","type":"Management","isSecurityGroupEnabled":"false","name":"dvS-ext,11,vmwaredvs"},{"deviceId":"3","networkRateMbps":"-1","defaultNic":"false","pxeDisable":"true","nicUuid":"d3b81df8-33da-4283-bfd1-3c2a2b677433","details":{"PromiscuousMode":"false","MacAddressChanges":"true","ForgedTransmits":"true","MacLearning":"false"},"dpdkEnabled":"false","uuid":"603ec6d8-f711-47ef-a71a-1da2566fd361","ip":"10.209.95.221","netmask":"255.255.252.0","gateway":"10.209.95.254","mac":"1e:00:58:03:72:0c","broadcastType":"Storage","type":"Storage","broadcastUri":"storage://6","isSecurityGroupEnabled":"false","name":"dvS-sto,6,vmwaredvs"}],"configDriveLocation":"SECONDARY","guestOsDetails":{},"extraConfig":{}},"result":"false","details":"StartCommand failed due to [Exception: java.lang.Exception
Message: secondary storage for dc 5 is not ready yet?
sureshanaparti commented 2 months ago

Hi @andrijapanicsb ACS checks for enough capacity on secondary storage while starting non-user VMs (mainly system VMs), to patch systemvm ISO. In case, no NFS secondary storage available or their threshold reached, it fails - but with inappropriate error. I've addressed the error messages / logs part 2, 3 here: #9207 , please check/review and let me know if that's ok. Any functional changes can be discussed, properly defined as an improvement and then we can take it forward. Agree?

sureshanaparti commented 2 months ago

Fixed with #9207