apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.11k stars 1.11k forks source link

systemVM doesnt start on linbit storage #7330

Open bashrusakh opened 1 year ago

bashrusakh commented 1 year ago

Hello guys, need help here.

ISSUE TYPE
COMPONENT NAME

systemVM

CLOUDSTACK VERSION

4.18rc2

CONFIGURATION

virtual lab : cat /etc/hosts on each 192.168.43.133 host.cloud.local 192.168.43.135 linstor.cloud.local - linstor combined 1node setup 192.168.43.134 mgmt.cloud.local 192.168.43.131 nfs.cloud.local

OS / ENVIRONMENT

linstor, mgmt, nfs - rockylinux 9.1 host - centos7

SUMMARY

systemVM doesnt start linstore node says Mar 11 16:50:35 linstor.cloud.local Controller[1007]: 16:50:35.215 [grizzly-http-server-3] ERROR LINSTOR/Controller - SYSTEM - Node 'host.cloud.local' not found. [Report number 640C092C-00000-000221]

host: Mar 11 17:00:35 host.cloud.local java[1324]: Caused by: com.linbit.linstor.api.ApiException: [{"ret_code":-4611686018407202516,"message":"Node 'host.cloud.local' not found.","cause":"The specified node 'host.cloud.local' could not be found in the database","correction":"Create a node with the name 'host.cloud.local' first.","details":"Node: host.cloud.local, Resource: 'cs-7c9c3890-9371-4d29-b35a-07c7ccc2a9b1'","error_report_ids":["640C092C-00000-000263"],"obj_refs":{"RscDfn":"cs-7c9c3890-9371-4d29-b35a-07c7ccc2a9b1","Node":"host.cloud.local"}}] Mar 11 17:00:35 host.cloud.local java[1324]: at com.linbit.linstor.api.ApiClient.invokeAPI(ApiClient.java:742) Mar 11 17:00:35 host.cloud.local java[1324]: at com.linbit.linstor.api.DevelopersApi.resourceMakeAvailableOnNode(DevelopersApi.java:2909) Mar 11 17:00:35 host.cloud.local java[1324]: at com.cloud.hypervisor.kvm.storage.LinstorStorageAdaptor.connectPhysicalDisk(LinstorStorageAdaptor.java:263) Mar 11 17:00:35 host.cloud.local java[1324]: ... 13 more Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: LXC Driver error : Domain not found: No domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:) (logid:4dc700db) Can not find a connection for Instance v-1-VM. Assuming the default connection. Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'v-1-VM' Mar 11 17:00:38 host.cloud.local java[1324]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:4dc700db) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied. Mar 11 17:00:38 host.cloud.local java[1324]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-5:) (logid:4dc700db) Groovy scripting engine is not initialized. Data transformation skipped. Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: LXC Driver error : Domain not found: No domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: WARN [kvm.resource.LibvirtConnection] (agentRequest-Handler-4:) (logid:914e961f) Can not find a connection for Instance s-2-VM. Assuming the default connection. Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 's-2-VM' Mar 11 17:00:38 host.cloud.local java[1324]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-4:) (logid:914e961f) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied. Mar 11 17:00:38 host.cloud.local java[1324]: WARN [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-4:) (logid:914e961f) Groovy scripting engine is not initialized. Data transformation skipped.

management console Mar 11 16:52:02 mgmt.cloud.local java[1019]: INFO [c.c.c.ConsoleProxyManagerImpl] (consoleproxy-1:ctx-2c5448d0) (logid:86064cb4) Found a stopped console proxy, starting it. Vm id : 1 Mar 11 16:52:02 mgmt.cloud.local java[1019]: INFO [o.a.c.s.PremiumSecondaryStorageManagerImpl] (secstorage-1:ctx-35180346) (logid:6fdfc298) No running secondary storage vms found in datacenter id=1, starting one Mar 11 16:52:02 mgmt.cloud.local java[1019]: INFO [o.a.c.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-35180346) (logid:6fdfc298) Found a stopped secondary storage VM instance {"id":2,"instanceName":"s-2-VM","type":"SecondaryStorageVm","uuid":"2aef38dd-dbbf-454d-b21e-838607129317"}, starting it. Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-35:ctx-fa192bca job-258/job-296) (logid:bbe68cc6) Add job-296 into job monitoring Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-36:ctx-1853bf10 job-260/job-297) (logid:3902b2b4) Add job-297 into job monitoring Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-35:ctx-fa192bca job-258/job-296 ctx-86062aa4 FirstFitRoutingAllocator) (logid:4dc700db) Guest VM is requested with Custom[UEFI] Boot Type false Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-36:ctx-1853bf10 job-260/job-297 ctx-35673300 FirstFitRoutingAllocator) (logid:914e961f) Guest VM is requested with Custom[UEFI] Boot Type false Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-fa192bca job-258/job-296 ctx-86062aa4) (logid:4dc700db) Re-ordering hosts [Host {"id":1,"name":"host.cloud.local","type":"Routing","uuid":"5ec47dc3-a71b-437f-b777-3ff556b12677"}] by priorities {} Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-fa192bca job-258/job-296 ctx-86062aa4) (logid:4dc700db) Hosts after re-ordering are: [Host {"id":1,"name":"host.cloud.local","type":"Routing","uuid":"5ec47dc3-a71b-437f-b777-3ff556b12677"}] Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-36:ctx-1853bf10 job-260/job-297 ctx-35673300) (logid:914e961f) Re-ordering hosts [Host {"id":1,"name":"host.cloud.local","type":"Routing","uuid":"5ec47dc3-a71b-437f-b777-3ff556b12677"}] by priorities {} Mar 11 16:52:03 mgmt.cloud.local java[1019]: INFO [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-36:ctx-1853bf10 job-260/job-297 ctx-35673300) (logid:914e961f) Hosts after re-ordering are: [Host {"id":1,"name":"host.cloud.local","type":"Routing","uuid":"5ec47dc3-a71b-437f-b777-3ff556b12677"}] Mar 11 16:52:05 mgmt.cloud.local java[1019]: INFO [o.a.c.s.SecondaryStorageManagerImpl] (Work-Job-Executor-36:ctx-1853bf10 job-260/job-297 ctx-35673300) (logid:914e961f) Using [192.168.43.131] as address of secondary storage of SSVM [s-2-VM]. Mar 11 16:52:07 mgmt.cloud.local java[1019]: WARN [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-fa192bca job-258/job-296 ctx-86062aa4) (logid:4dc700db) Unable to orchestrate start VM instance {"id":1,"instanceName":"v-1-VM","type":"ConsoleProxy","uuid":"742df044-d834-42cf-ab72-5f0592db398d"} due to [Unable to get answer that is of class com.cloud.agent.api.StartAnswer].

linstor node [root@linstor ~]# linstor rd l ╭────────────────────────────────────────────────────────────────────────╮ ┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊ ╞════════════════════════════════════════════════════════════════════════╡ ┊ cs-3407d5c3-fa94-4715-8d65-968afdcfb72e ┊ 7000 ┊ cloudstack ┊ ok ┊ ┊ cs-7a1d3c4e-b394-4f9c-8e84-51fed273938d ┊ 7002 ┊ cloudstack ┊ ok ┊ ┊ cs-7c9c3890-9371-4d29-b35a-07c7ccc2a9b1 ┊ 7001 ┊ cloudstack ┊ ok ┊ ╰────────────────────────────────────────────────────────────────────────╯

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template!

bashrusakh commented 1 year ago

some comments, i read manuals for some other IaaS-software and watched linstor manuals on youtube many time, but couldnt find setup like : linstor cluster for storage ONLY and compute cluster for compute ONLY. manuals doesnt say it - but i assume only currently working configuration is storage&compute on same nodes. reading logs in my first message put me an idea to include my virtualization host(compute node) in linstor cluster

[root@linstor ~]# linstor n l ╭─────────────────────────────────────────────────────────────────────╮ ┊ Node ┊ NodeType ┊ Addresses ┊ State ┊ ╞═════════════════════════════════════════════════════════════════════╡ ┊ host.cloud.local ┊ SATELLITE ┊ 192.168.43.133:3366 (PLAIN) ┊ Online ┊ ┊ linstor ┊ COMBINED ┊ 192.168.43.135:3366 (PLAIN) ┊ Online ┊ ╰─────────────────────────────────────────────────────────────────────╯ then i created diskless storage pool( compute node doesnt have storage, nor LVM\zfs\etc) [root@linstor ~]# linstor sp l ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ DfltDisklessStorPool ┊ host.cloud.local ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ linstor ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ cloudstack_pool ┊ host.cloud.local ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ cloudstack_pool ┊ linstor ┊ LVM ┊ linstor ┊ 5.34 GiB ┊ 20.00 GiB ┊ False ┊ Ok ┊ ┊ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ and logs Mar 12 00:46:08 linstor.cloud.local Controller[1057]: 00:46:08.300 [MainWorkerPool-2] ERROR LINSTOR/Controller - SYSTEM - Autoplacer could not find diskless stor pool on node host.cloud.local matching resource-groups autoplace-settings [Report number 640C729C-00000-000117] soooo is my configuration not supported? cloudstack works with linstor only as compute&storage node, not independent?

rohityadavcloud commented 1 year ago

cc @rp-

rp- commented 1 year ago

Of course the compute node needs to be known and setup to Linstor, otherwise Linstor cannot create diskless assignments to it. Can you also provide the output of linstor rg l The last error message specifies actually what is missing right know, it can't create a diskless resource on the node, as the autoplacer prevents it.

DaanHoogland commented 1 year ago

Of course the compute node needs to be known and setup to Linstor, otherwise Linstor cannot create diskless assignments to it. Can you also provide the output of linstor rg l The last error message specifies actually what is missing right know, it can't create a diskless resource on the node, as the autoplacer prevents it.

@bashrusakh ^^?

JoaoJandre commented 10 months ago

From @rp- 's answer, this seems like an environmental issue, thus, I'll be removing it from the 4.18.2.0 milestone.

bashrusakh commented 10 months ago

From @rp- 's answer, this seems like an environmental issue, thus, I'll be removing it from the 4.18.2.0 milestone.

guys, like i said in first post - its combined 1node setup. combined means controller+satellite on same Storage-only node(not a compute node). anyway - if you think its not bug - up to you. same config works fine on alternative software.

https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#ch-cloudstack

192.168.43.133 host.cloud.local - Compute node( **linstor satellite installed**)
` host.cloud.local ┊ SATELLITE ┊ 192.168.43.133:3366 (PLAIN) ┊ Online ┊`
192.168.43.135 linstor.cloud.local - linstor combined 1node setup
192.168.43.134 mgmt.cloud.local - mgmt  node (no linstor satelite  - its not a storage node)
192.168.43.131 nfs.cloud.local - nfs node(no linstor satelite  - its not a storage node)

campai

rp- commented 10 months ago

Joaojandre was refering to if it is a cloudstack bug. You still didn't provide output of linstor rg l, it looks like it might be a misconfiguration from linstor site.