apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.05k stars 1.1k forks source link

Systemvm's are on stating state #7307

Closed Noelantogerorge closed 1 year ago

Noelantogerorge commented 1 year ago

I am configuring cloudstack for the first time and this is new setup. I had configured cloudstack on ubuntu and connect a host running on centos having kvm installed. I tested the kvm by deploying a VM manually and it worked. I had configured 2 diffrent vm for primary and secondary nfs. Attached primary and on my kvm I can see same has mounted and virsh pool-list is showing the same. I also added secondary but secondary nfs is not mouted and virsh pool is not showing the same. When checking system vm it is showing starting state. It keeps on starting state for long time and move to stopped state.

ISSUE TYPE
CONFIGURATION
OS / ENVIRONMENT

Managemnet server on ubuntu KVM on centos NFS on ubuntu

SUMMARY

I am configuring cloudstack for the first time and this is new setup. I had configured cloudstack on ubuntu and connect a host running on centos having kvm installed. I tested the kvm by deploying a VM manually and it worked. I had configured 2 diffrent vm for primary and secondary nfs. Attached primary and on my kvm I can see same has mounted and virsh pool-list is showing the same. I also added secondary but secondary nfs is not mouted and virsh pool is not showing the same. When checking system vm it is showing starting state. It keeps on starting state for long time and move to stopped state. This might be a small configuration issue as I am new to this unbale to figure it out

STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS
boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template!

weizhouapache commented 1 year ago

@Noelantogerorge the templates (including system template) are stored on secondary storage. If secondary storage is not reachable, system vms will not be working as the systemvm template needs to be copied from secondary storage to primary stoage.

You can find some clues in management-server.log

Noelantogerorge commented 1 year ago

Hi, Secondary storage is accessable from management server and from KVM host. I tried manually mounting the same on both server and it worked. I tried copying the templete folder from secondary storage to primary then also it is the same. I am attaching the agent log from kvm host and management log. Can you please analyse the same agent.log management-server.log

kiranchavala commented 1 year ago

Hi @Noelantogerorge

In the agent.log i see the following exception and in the management server log i see the deployment fails due to insufficent server capacity

2023-03-02 07:18:47,174 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-4:null) (logid:9b46a076) Looking for libvirtd connection at: qemu:///system
2023-03-02 07:18:47,186 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:9b46a076) Successfully refreshed pool f94fafba-7ef6-3741-bd33-bea239780b31 Capacity: (7.2035 TB) 7920348561408 Used: (23.67 GB) 25417482240 Available: (7.1804 TB) 7894931079168
2023-03-02 07:18:47,188 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:9b46a076) Could not find volume e0faba80-b353-11ed-b1c6-000c291c8f0b: Storage volume not found: no storage vol with matching name 'e0faba80-b353-11ed-b1c6-000c291c8f0b'
2023-03-02 07:18:47,188 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:9b46a076) Refreshing storage pool f94fafba-7ef6-3741-bd33-bea239780b31

Management server.log



2023-03-02 12:17:33,755 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No clusters found after removing disabled clusters and clusters in avoid list, returning.
2023-03-02 12:17:33,782 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) VM instance {id: "945", name: "s-945-VM", uuid: "9633b949-c650-44a6-b7ba-5a733b7624c4", type="SecondaryStorageVm"} state transited from [Starting] to [Stopped] with event [OperationFailed]. VM's original host: null, new host: null, host before state transition: null

2023-03-02 12:17:33,785 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Invocation exception, caused by: com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {id: "945", name: "s-945-VM", uuid: "9633b949-c650-44a6-b7ba-5a733b7624c4", type="SecondaryStorageVm"}Scope=interface com.cloud.dc.DataCenter; id=1

2023-03-02 12:17:33,786 INFO  [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Rethrow exception com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {id: "945", name: "s-945-VM", uuid: "9633b949-c650-44a6-b7ba-5a733b7624c4", type="SecondaryStorageVm"}Scope=interface com.cloud.dc.DataCenter; id=1

2023-03-02 12:17:33,787 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874) (logid:9b46a076) Done with run of VM work job: com.cloud.vm.VmWorkStart for VM 945, job origin: 2873

2023-03-02 12:17:33,788 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874) (logid:9b46a076) Unable to complete AsyncJobVO: {id:2874, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: 
shwstppr commented 1 year ago

@Noelantogerorge it is failing while trying to find suitable primary storage for the volume. Can you please check if the primary storage with name nfsp1 is in the correct state and has capacity

2023-03-02 12:17:33,711 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Calling StoragePoolAllocators to find suitable pools
2023-03-02 12:17:33,717 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) System VMs will use shared storage for zone id=1
2023-03-02 12:17:33,717 DEBUG [o.a.c.s.a.LocalStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) LocalStoragePoolAllocator trying to find storage pool to fit the vm
2023-03-02 12:17:33,717 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) ClusterScopeStoragePoolAllocator looking for storage pool
2023-03-02 12:17:33,722 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Looking for pools in dc: 1  pod:1  cluster:1. Disabled pools will be ignored.
2023-03-02 12:17:33,725 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Found pools matching tags: [Pool[1|NetworkFilesystem]]
2023-03-02 12:17:33,728 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Checking if storage pool is suitable, name: nfsp1 ,poolId: 1
2023-03-02 12:17:33,733 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) ClusterScopeStoragePoolAllocator returning 0 suitable storage pools
2023-03-02 12:17:33,734 DEBUG [o.a.c.s.a.ZoneWideStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) ZoneWideStoragePoolAllocator to find storage pool
2023-03-02 12:17:33,738 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No suitable pools found for volume: Vol[945|vm=945|ROOT] under cluster: 1
2023-03-02 12:17:33,738 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No suitable pools found
2023-03-02 12:17:33,739 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No suitable storagePools found under this Cluster: 1
2023-03-02 12:17:33,746 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Could not find suitable Deployment Destination for this VM under any clusters, returning. 
Noelantogerorge commented 1 year ago

yes primary storage is in green state and up. It is accessable from kvm host and management server. Also I check the database and can see primary storage and host storage on storage_pool table

Noelantogerorge commented 1 year ago

KVM host is having sufficient capacity for the VM. As this a freash installation I had created new host for the same. But I don't know why cloudstack is showing insufficient capacity log.. KVM Host configuration OS - centos CPU 8 RAM 50GB HDD 100 GB

weizhouapache commented 1 year ago

@Noelantogerorge it is failing while trying to find suitable primary storage for the volume. Can you please check if the primary storage with name nfsp1 is in the correct state and has capacity

2023-03-02 12:17:33,711 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Calling StoragePoolAllocators to find suitable pools
2023-03-02 12:17:33,717 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) System VMs will use shared storage for zone id=1
2023-03-02 12:17:33,717 DEBUG [o.a.c.s.a.LocalStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) LocalStoragePoolAllocator trying to find storage pool to fit the vm
2023-03-02 12:17:33,717 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) ClusterScopeStoragePoolAllocator looking for storage pool
2023-03-02 12:17:33,722 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Looking for pools in dc: 1  pod:1  cluster:1. Disabled pools will be ignored.
2023-03-02 12:17:33,725 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Found pools matching tags: [Pool[1|NetworkFilesystem]]
2023-03-02 12:17:33,728 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Checking if storage pool is suitable, name: nfsp1 ,poolId: 1
2023-03-02 12:17:33,733 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) ClusterScopeStoragePoolAllocator returning 0 suitable storage pools
2023-03-02 12:17:33,734 DEBUG [o.a.c.s.a.ZoneWideStoragePoolAllocator] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) ZoneWideStoragePoolAllocator to find storage pool
2023-03-02 12:17:33,738 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No suitable pools found for volume: Vol[945|vm=945|ROOT] under cluster: 1
2023-03-02 12:17:33,738 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No suitable pools found
2023-03-02 12:17:33,739 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) No suitable storagePools found under this Cluster: 1
2023-03-02 12:17:33,746 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-1:ctx-359bb696 job-2873/job-2874 ctx-d477561f) (logid:9b46a076) Could not find suitable Deployment Destination for this VM under any clusters, returning. 

@Noelantogerorge Can you check the primary storage as @shwstppr said ?

shwstppr commented 1 year ago

@Noelantogerorge are you using storage tags for your primary store? can you share output of list storagepoolsmetrics using cmk

Noelantogerorge commented 1 year ago

When configration I had used a tag as primary_storage for primary storage...

(localcloud) 🐱 > list storagepoolsmetrics { "count": 2, "storagepool": [ { "clusterid": "f6343a16-9720-4cfa-9ee4-2b4d023384b6", "clustername": "kvm_cluster", "created": "2023-02-24T10:27:31+0000", "disksizeallocated": 0, "disksizeallocatedgb": "0.00 GB", "disksizetotal": 53660876800, "disksizetotalgb": "49.98 GB (x2.0)", "disksizeunallocatedgb": "99.95 GB", "hasannotations": false, "hypervisor": "KVM", "id": "a2f6fd69-14e4-41cb-9299-df2b50dea525", "ipaddress": "10.60.0.116", "name": "kvmhost1-local-a2f6fd69", "overprovisionfactor": "2.0", "path": "/var/lib/libvirt/images", "podid": "0faa2614-d022-4dfb-bdc5-ee826c8451ad", "podname": "POD-1", "provider": "DefaultPrimary", "scope": "HOST", "state": "Up", "storageallocatedthreshold": false, "storagecapabilities": { "VOLUME_SNAPSHOT_QUIESCEVM": "false" }, "type": "Filesystem", "zoneid": "cd1a1496-f355-4b23-8ce4-f20bcc15f43b", "zonename": "Zone-1" }, { "clusterid": "f6343a16-9720-4cfa-9ee4-2b4d023384b6", "clustername": "kvm_cluster", "created": "2023-02-24T08:12:11+0000", "disksizeallocated": 0, "disksizeallocatedgb": "0.00 GB", "disksizetotal": 7920348561408, "disksizetotalgb": "7376.40 GB (x2.0)", "disksizeunallocatedgb": "14752.80 GB", "hasannotations": false, "hypervisor": "KVM", "id": "f94fafba-7ef6-3741-bd33-bea239780b31", "ipaddress": "10.60.0.9", "name": "nfsp1", "overprovisionfactor": "2.0", "path": "/mnt/nfs_share", "podid": "0faa2614-d022-4dfb-bdc5-ee826c8451ad", "podname": "POD-1", "provider": "DefaultPrimary", "scope": "CLUSTER", "state": "Up", "storageallocatedthreshold": false, "storagecapabilities": { "VOLUME_SNAPSHOT_QUIESCEVM": "false" }, "tags": "primary_storage", "type": "NetworkFilesystem", "zoneid": "cd1a1496-f355-4b23-8ce4-f20bcc15f43b", "zonename": "Zone-1" } ] }

shwstppr commented 1 year ago

@Noelantogerorge can you please update your primary storage and remove that tag or else you will have to create a new system service offerings specifying same tag to use that primary store

shwstppr commented 1 year ago

@Noelantogerorge You may refer, https://docs.cloudstack.apache.org/en/latest/adminguide/storage.html#storage-tags

Noelantogerorge commented 1 year ago

@shwstppr Thanks for your response.. I had remove the taging for primary storage and tried but it was in same state. So I tried some changes on KVM system but nothing work as espected. So I recreate a simple setup like 1 server for mangement,primary and secondary storage and 1 server for kvm It works.. System VM's status are showing as running... On my first setup I had configured 2 systems for management, LB for load balancing them, Database server, 2 servers for primary and secondary storage, 1 KVM server. Hope it should work right?

shwstppr commented 1 year ago

@Noelantogerorge looks fine to me. So, have you already destroyed your first setup?

Noelantogerorge commented 1 year ago

Not destroyed I think I have gone some configuration error on the first setup. I will try recreating the same and check... Thank you so much for the response and help....