User from domain can't deploy VM in pod dedicated to that domain

edikevich commented 2 years ago

ISSUE TYPE

Bug Report

CLOUDSTACK VERSION

4.16.0

CONFIGURATION

XCP-NG 8.2 Advanced Zone

OS / ENVIRONMENT

Oracle Linux 8.4 - Management server, MYSQL 8.0.26 XCP-NG 8.2 - hypervisor

SUMMARY

User from domain1 can't deploy VM in dedicated to domain1 pod.

STEPS TO REPRODUCE

Create domain. Create user in this domain. Dedicate pod to this domain. Create Compute offering with ImplicitDedicationPlanner with strict policy and choose domain1.

EXPECTED RESULTS

All users from this domain can deploy VMs only in this Pod

ACTUAL RESULTS

User from this domain can't deploy VM in dedicated pod.

DaanHoogland commented 2 years ago

Is this new? Did you test in 4.15 @edikevich? @sureshanaparti is this one good for 4.16.1?

edikevich commented 2 years ago

Hi. Sorry for delay. I didn't test this feature in 4.15.2 :( I think problem is near Affinity group which is created after dedicate host/cluster/pod. I tried create VM from root Admin then grand this VM to user from domain but not user or root admin can't start VM in dedicated host/cluster/pod :(

DaanHoogland commented 2 years ago

Hey @edikevich do you mean to say that the issue is

Create domain.
Create user in this domain.
Dedicate pod to this domain.
Create Compute offering with ImplicitDedicationPlanner with strict policy and choose domain1.
Add an affinity group ???

now user can't a VM in this domain?

edikevich commented 2 years ago

Hi. Not exactly. When pod is dedicated affinity group is created automatically by CS. Other steps are correct.

DaanHoogland commented 2 years ago

@edikevich can you add some error logs and preferably a stack trace. I am trying to reproduce this, but haven't succeeded yet. Mostly due to environment build.

edikevich commented 2 years ago

Hi. I don't have any errors in log. I see only that no suitable hosts found.

edikevich commented 2 years ago

Sorry. Have Error:

2022-01-12 12:42:14,559 INFO [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"}Scope=interface com.cloud.dc.DataCenter; id=1 2022-01-12 12:42:14,559 INFO [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"} com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"}Scope=interface com.cloud.dc.DataCenter; id=1 at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.reserveVirtualMachine(VMEntityManagerImpl.java:225) at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.reserve(VirtualMachineEntityImpl.java:202) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:5207) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4693) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4682) at jdk.internal.reflect.GeneratedMethodAccessor1658.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) at com.sun.proxy.$Proxy127.startVirtualMachine(Unknown Source) at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:696) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156) at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2022-01-12 12:42:14,561 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Complete async job-3810, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"533","errortext":"Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"}"}

DaanHoogland commented 2 years ago

This message says that there is no deployment plan suitable, but above this message there should be more info on how this was tried, @edikevich . If you grep for 'job-3810' in the log you should get all the info related to the failure. Can you do that and see if there is more relevant info? What I am looking for is: Is the lack of host the capacity problem or something else? (most likely storage then) Did it even try the dedicated domain? and if so, why did it refuse them?

edikevich commented 2 years ago

2022-01-12 12:42:14,354 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:0d4e47c6) Add job-3810 into job monitoring 2022-01-12 12:42:14,358 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp239372207-1931187:ctx-e0fcca0e ctx-b49fa34b) (logid:ccaa1505) submit async job-3810, details: AsyncJobVO {id:3810, userId: 37, accountId: 34, instanceType: VirtualMachine, instanceId: 334, cmd: org.apache.cloudstack.api.command.user.vm.DeployVMCmd, cmdInfo: {"iptonetworklist[0].networkid":"35d851d4-ccf2-4aaa-a372-5946aa1fd432","boottype":"BIOS","httpmethod":"POST","templateid":"445f7bc5-4d1e-4dff-a171-962adf124733","ctxAccountId":"34","uuid":"88d9976d-0b1a-4c78-a201-b6d153cf0e58","cmdEventType":"VM.CREATE","startvm":"true","bootmode":"LEGACY","serviceofferingid":"29710aa1-50be-472d-b76b-3dbcad42fc1b","response":"json","ctxUserId":"37","displayname":"test-vm1","name":"test-vm1","zoneid":"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252","ctxStartEventId":"13128","id":"334","ctxDetails":"{\"interface com.cloud.offering.ServiceOffering\":\"29710aa1-50be-472d-b76b-3dbcad42fc1b\",\"interface com.cloud.dc.DataCenter\":\"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252\",\"interface com.cloud.template.VirtualMachineTemplate\":\"445f7bc5-4d1e-4dff-a171-962adf124733\",\"interface com.cloud.vm.VirtualMachine\":\"88d9976d-0b1a-4c78-a201-b6d153cf0e58\"}","affinitygroupids":""}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 172320828, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null} 2022-01-12 12:42:14,359 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Executing AsyncJobVO {id:3810, userId: 37, accountId: 34, instanceType: VirtualMachine, instanceId: 334, cmd: org.apache.cloudstack.api.command.user.vm.DeployVMCmd, cmdInfo: {"iptonetworklist[0].networkid":"35d851d4-ccf2-4aaa-a372-5946aa1fd432","boottype":"BIOS","httpmethod":"POST","templateid":"445f7bc5-4d1e-4dff-a171-962adf124733","ctxAccountId":"34","uuid":"88d9976d-0b1a-4c78-a201-b6d153cf0e58","cmdEventType":"VM.CREATE","startvm":"true","bootmode":"LEGACY","serviceofferingid":"29710aa1-50be-472d-b76b-3dbcad42fc1b","response":"json","ctxUserId":"37","displayname":"test-vm1","name":"test-vm1","zoneid":"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252","ctxStartEventId":"13128","id":"334","ctxDetails":"{\"interface com.cloud.offering.ServiceOffering\":\"29710aa1-50be-472d-b76b-3dbcad42fc1b\",\"interface com.cloud.dc.DataCenter\":\"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252\",\"interface com.cloud.template.VirtualMachineTemplate\":\"445f7bc5-4d1e-4dff-a171-962adf124733\",\"interface com.cloud.vm.VirtualMachine\":\"88d9976d-0b1a-4c78-a201-b6d153cf0e58\"}","affinitygroupids":""}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 172320828, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null} 2022-01-12 12:42:14,361 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Ignoring paremeter displayvm as the caller is not authorized to pass it in 2022-01-12 12:42:14,361 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Ignoring paremeter deploymentplanner as the caller is not authorized to pass it in 2022-01-12 12:42:14,369 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Access to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker 2022-01-12 12:42:14,374 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Access to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker 2022-01-12 12:42:14,380 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Access to Tmpl[228-VHD-0219c594-238a-4cba-af52-65609364d5f7 granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker 2022-01-12 12:42:14,380 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Ignoring paremeter displayvm as the caller is not authorized to pass it in 2022-01-12 12:42:14,380 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Ignoring paremeter deploymentplanner as the caller is not authorized to pass it in 2022-01-12 12:42:14,394 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Access to VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"} granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker 2022-01-12 12:42:14,412 DEBUG [c.c.n.NetworkModelImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Service SecurityGroup is not supported in the network id=260 2022-01-12 12:42:14,418 DEBUG [c.c.n.NetworkModelImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Service SecurityGroup is not supported in the network id=260 2022-01-12 12:42:14,433 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) DeploymentPlanner allocation algorithm: null 2022-01-12 12:42:14,433 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Trying to allocate a host and storage pools from dc:1, pod:null,cluster:null, requested cpu: 2000, requested ram: (2.00 GB) 2147483648 2022-01-12 12:42:14,433 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Is ROOT volume READY (pool already allocated)?: No 2022-01-12 12:42:14,441 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Deploy avoids pods: [2], clusters: [], hosts: [] 2022-01-12 12:42:14,442 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Searching all possible resources under this Zone: 1 2022-01-12 12:42:14,444 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Listing clusters in order of aggregate capacity, that have (at least one host with) enough CPU and RAM capacity under this Zone: 1 2022-01-12 12:42:14,448 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Removing from the clusterId list these clusters from avoid set: [] 2022-01-12 12:42:14,454 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) The clusterId list for the given offering tag: [] 2022-01-12 12:42:14,454 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) No clusters found after removing disabled clusters and clusters in avoid list, returning. 2022-01-12 12:42:14,457 DEBUG [c.c.v.UserVmManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Destroying vm VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"} as it failed to create on Host with Id:null 2022-01-12 12:42:14,468 DEBUG [c.c.c.CapacityManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"} state transited from [Stopped] to [Error] with event [OperationFailedToError]. VM's original host: null, new host: null, host before state transition: null 2022-01-12 12:42:14,492 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Updating resource Type = volume count for Account = 34 Operation = decreasing Amount = 1 2022-01-12 12:42:14,502 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Updating resource Type = primary_storage count for Account = 34 Operation = decreasing Amount = (10.00 GB) 10737418240 2022-01-12 12:42:14,517 WARN [c.c.a.AlertManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) alertType=[8] dataCenterId=[1] podId=[null] clusterId=[null] message=[Failed to deploy Vm with Id: 334, on Host with Id: null]. 2022-01-12 12:42:14,523 WARN [c.c.a.AlertManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) No recipients set in 'alert.email.addresses', skipping sending alert with subject: Failed to deploy Vm with Id: 334, on Host with Id: null and content: Failed to deploy Vm with Id: 334, on Host with Id: null 2022-01-12 12:42:14,525 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Updating resource Type = user_vm count for Account = 34 Operation = decreasing Amount = 1 2022-01-12 12:42:14,535 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Updating resource Type = cpu count for Account = 34 Operation = decreasing Amount = 1 2022-01-12 12:42:14,544 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Updating resource Type = memory count for Account = 34 Operation = decreasing Amount = 2048 2022-01-12 12:42:14,559 INFO [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"}Scope=interface com.cloud.dc.DataCenter; id=1 2022-01-12 12:42:14,559 INFO [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"} 2022-01-12 12:42:14,561 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Complete async job-3810, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"533","errortext":"Unable to create a deployment for VM instance {id: "334", name: "i-34-334-VM", uuid: "88d9976d-0b1a-4c78-a201-b6d153cf0e58", type="User"}"} 2022-01-12 12:42:14,562 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Publish async job-3810 complete on message bus 2022-01-12 12:42:14,562 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Wake up jobs related to job-3810 2022-01-12 12:42:14,562 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Update db status for job-3810 2022-01-12 12:42:14,563 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Wake up jobs joined with job-3810 and disjoin all subjobs created from job- 3810 2022-01-12 12:42:14,570 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Done executing org.apache.cloudstack.api.command.user.vm.DeployVMCmd for job-3810 2022-01-12 12:42:14,570 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-17:ctx-10e86d4b job-3810) (logid:7ebaa05f) Remove job-3810 from job monitoring

or you want all log before job start?

DaanHoogland commented 2 years ago

no, this is what we need to find the culprit in

DaanHoogland commented 2 years ago

@edikevich , Are you expecting the dedicated POD to be automatically choosen? this line says there is no POD chosen:

2022-01-12 12:42:14,433 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Trying to allocate a host and storage pools from dc:1, pod:null,cluster:null, requested cpu: 2000, requested ram: (2.00 GB) 2147483648

Are there any not-dedicated PODs in your environment? the following line says there are 2 POD that will be avoided.

2022-01-12 12:42:14,441 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Deploy avoids pods: [2], clusters: [], hosts: []

the decision to not deploy is at the cluster level:

2022-01-12 12:42:14,444 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Listing clusters in order of aggregate capacity, that have (at least one host with) enough CPU and RAM capacity under this Zone: 1
2022-01-12 12:42:14,448 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) Removing from the clusterId list these clusters from avoid set: []
2022-01-12 12:42:14,454 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) The clusterId list for the given offering tag: []
2022-01-12 12:42:14,454 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-17:ctx-10e86d4b job-3810 ctx-ed36cebb) (logid:7ebaa05f) No clusters found after removing disabled clusters and clusters in avoid list, returning.

It looks like there is a tag on the offering that none of the (1) cluster(s) has so there is a mismatch in tags.

edikevich commented 2 years ago

I have dedicated pod in my environment

I created offering without tag before and had same result

DaanHoogland commented 2 years ago

I have recreated this scenario and the user can deploy a vm in the but the admin cannot. I think there is some extra environmental issue @edikevich . That picture shows no netmask or gateway, can you add those? In my environment (which is a lab environment) , these have the same values as the main pod in the zone. If you have a real datacentre deployment you'd want to take more care about what you are choosing.

edikevich commented 2 years ago

All our pods have different CIDRs. But all out hypervisords can see Management server. When I release dedicated Pod I can deploy VMs in this Pod.

DaanHoogland commented 2 years ago

Ok, than I have no clue left @edikevich . I have recreated it with KVM so it might be a Xen(CPng(8.2)) thing, though I doubt that is the case.

edikevich commented 2 years ago

Log when Compute Offering don't contains any tags

2022-01-12 17:14:30,364 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:85ae8d9d) Add job-3832 into job monitoring
2022-01-12 17:14:30,368 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp239372207-1939856:ctx-5eab9574 ctx-c6dff037) (logid:c28fe315) submit async job-3832, details: AsyncJobVO {id:3832, userId: 37, accountId: 34, instanceType: VirtualMachine, instanceId: 337, cmd: org.apache.cloudstack.api.command.user.vm.DeployVMCmd, cmdInfo: {"iptonetworklist[0].networkid":"35d851d4-ccf2-4aaa-a372-5946aa1fd432","boottype":"BIOS","httpmethod":"POST","templateid":"445f7bc5-4d1e-4dff-a171-962adf124733","ctxAccountId":"34","uuid":"0f5625b4-f8f4-4221-9752-d119efd00e6a","cmdEventType":"VM.CREATE","startvm":"true","bootmode":"LEGACY","serviceofferingid":"d5fc1f4e-940e-4b79-ac0a-be61e9792e36","response":"json","ctxUserId":"37","zoneid":"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252","ctxStartEventId":"13167","id":"337","ctxDetails":"{\"interface com.cloud.offering.ServiceOffering\":\"d5fc1f4e-940e-4b79-ac0a-be61e9792e36\",\"interface com.cloud.dc.DataCenter\":\"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252\",\"interface com.cloud.template.VirtualMachineTemplate\":\"445f7bc5-4d1e-4dff-a171-962adf124733\",\"interface com.cloud.vm.VirtualMachine\":\"0f5625b4-f8f4-4221-9752-d119efd00e6a\"}","affinitygroupids":""}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 172320828, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2022-01-12 17:14:30,368 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Executing AsyncJobVO {id:3832, userId: 37, accountId: 34, instanceType: VirtualMachine, instanceId: 337, cmd: org.apache.cloudstack.api.command.user.vm.DeployVMCmd, cmdInfo: {"iptonetworklist[0].networkid":"35d851d4-ccf2-4aaa-a372-5946aa1fd432","boottype":"BIOS","httpmethod":"POST","templateid":"445f7bc5-4d1e-4dff-a171-962adf124733","ctxAccountId":"34","uuid":"0f5625b4-f8f4-4221-9752-d119efd00e6a","cmdEventType":"VM.CREATE","startvm":"true","bootmode":"LEGACY","serviceofferingid":"d5fc1f4e-940e-4b79-ac0a-be61e9792e36","response":"json","ctxUserId":"37","zoneid":"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252","ctxStartEventId":"13167","id":"337","ctxDetails":"{\"interface com.cloud.offering.ServiceOffering\":\"d5fc1f4e-940e-4b79-ac0a-be61e9792e36\",\"interface com.cloud.dc.DataCenter\":\"a5a0c8c3-f4ed-48e9-b3bf-3d645c96d252\",\"interface com.cloud.template.VirtualMachineTemplate\":\"445f7bc5-4d1e-4dff-a171-962adf124733\",\"interface com.cloud.vm.VirtualMachine\":\"0f5625b4-f8f4-4221-9752-d119efd00e6a\"}","affinitygroupids":""}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 172320828, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2022-01-12 17:14:30,371 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Ignoring paremeter displayvm as the caller is not authorized to pass it in
2022-01-12 17:14:30,371 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Ignoring paremeter deploymentplanner as the caller is not authorized to pass it in
2022-01-12 17:14:30,382 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Access to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker
2022-01-12 17:14:30,388 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Access to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker
2022-01-12 17:14:30,393 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Access to Tmpl[228-VHD-0219c594-238a-4cba-af52-65609364d5f7 granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker
2022-01-12 17:14:30,393 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Ignoring paremeter displayvm as the caller is not authorized to pass it in
2022-01-12 17:14:30,393 DEBUG [o.a.c.a.BaseCmd] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Ignoring paremeter deploymentplanner as the caller is not authorized to pass it in
2022-01-12 17:14:30,409 DEBUG [c.c.u.AccountManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Access to VM instance {id: "337", name: "i-34-337-VM", uuid: "0f5625b4-f8f4-4221-9752-d119efd00e6a", type="User"} granted to Acct[160ae17c-428d-4486-ac32-18c172a02735-esm] -- Account {"id": 34, "name": "esm", "uuid": "160ae17c-428d-4486-ac32-18c172a02735"} by DomainChecker
2022-01-12 17:14:30,424 DEBUG [c.c.n.NetworkModelImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Service SecurityGroup is not supported in the network id=260
2022-01-12 17:14:30,429 DEBUG [c.c.n.NetworkModelImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Service SecurityGroup is not supported in the network id=260
2022-01-12 17:14:30,442 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) DeploymentPlanner allocation algorithm: null
2022-01-12 17:14:30,442 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Trying to allocate a host and storage pools from dc:1, pod:null,cluster:null, requested cpu: 4000, requested ram: (4.00 GB) 4294967296
2022-01-12 17:14:30,442 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Is ROOT volume READY (pool already allocated)?: No
2022-01-12 17:14:30,449 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Deploy avoids pods: [2], clusters: [], hosts: []
2022-01-12 17:14:30,450 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Searching all possible resources under this Zone: 1
2022-01-12 17:14:30,451 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Listing clusters in order of aggregate capacity, that have (at least one host with) enough CPU and RAM capacity under this Zone: 1
2022-01-12 17:14:30,455 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Removing from the clusterId list these clusters from avoid set: []
2022-01-12 17:14:30,514 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 1 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,515 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 1 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,520 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 2 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,521 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 2 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,525 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 3 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,526 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 3 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,529 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 4 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,530 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 4 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,534 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 5 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,535 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 5 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,539 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 6 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,540 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 6 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,544 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 7 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,545 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 7 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,548 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 8 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,549 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 8 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,555 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 9 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,556 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 9 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,560 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 10 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,560 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 10 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,564 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 11 found to be unsuitable for implicit dedication as it is running instances of another account
2022-01-12 17:14:30,565 INFO  [c.c.d.ImplicitDedicationPlanner] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Host 11 found to be running a vm created by a planner other than implicit.
2022-01-12 17:14:30,566 DEBUG [c.c.v.UserVmManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Destroying vm VM instance {id: "337", name: "i-34-337-VM", uuid: "0f5625b4-f8f4-4221-9752-d119efd00e6a", type="User"} as it failed to create on Host with Id:null
2022-01-12 17:14:30,576 DEBUG [c.c.c.CapacityManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) VM instance {id: "337", name: "i-34-337-VM", uuid: "0f5625b4-f8f4-4221-9752-d119efd00e6a", type="User"} state transited from [Stopped] to [Error] with event [OperationFailedToError]. VM's original host: null, new host: null, host before state transition: null
2022-01-12 17:14:30,596 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Updating resource Type = volume count for Account = 34 Operation = decreasing Amount = 1
2022-01-12 17:14:30,604 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Updating resource Type = primary_storage count for Account = 34 Operation = decreasing Amount = (10.00 GB) 10737418240
2022-01-12 17:14:30,618 WARN  [c.c.a.AlertManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) alertType=[8] dataCenterId=[1] podId=[null] clusterId=[null] message=[Failed to deploy Vm with Id: 337, on Host with Id: null].
2022-01-12 17:14:30,623 WARN  [c.c.a.AlertManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) No recipients set in 'alert.email.addresses', skipping sending alert with subject: Failed to deploy Vm with Id: 337, on Host with Id: null and content: Failed to deploy Vm with Id: 337, on Host with Id: null
2022-01-12 17:14:30,625 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Updating resource Type = user_vm count for Account = 34 Operation = decreasing Amount = 1
2022-01-12 17:14:30,633 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Updating resource Type = cpu count for Account = 34 Operation = decreasing Amount = 2
2022-01-12 17:14:30,641 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Updating resource Type = memory count for Account = 34 Operation = decreasing Amount = 4096
2022-01-12 17:14:30,654 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM instance {id: "337", name: "i-34-337-VM", uuid: "0f5625b4-f8f4-4221-9752-d119efd00e6a", type="User"}Scope=interface com.cloud.dc.DataCenter; id=1
2022-01-12 17:14:30,655 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Unable to create a deployment for VM instance {id: "337", name: "i-34-337-VM", uuid: "0f5625b4-f8f4-4221-9752-d119efd00e6a", type="User"}
2022-01-12 17:14:30,656 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Complete async job-3832, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"533","errortext":"Unable to create a deployment for VM instance {id: "337", name: "i-34-337-VM", uuid: "0f5625b4-f8f4-4221-9752-d119efd00e6a", type="User"}"}
2022-01-12 17:14:30,657 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Publish async job-3832 complete on message bus
2022-01-12 17:14:30,657 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Wake up jobs related to job-3832
2022-01-12 17:14:30,657 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Update db status for job-3832
2022-01-12 17:14:30,658 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Wake up jobs joined with job-3832 and disjoin all subjobs created from job- 3832
2022-01-12 17:14:30,663 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Done executing org.apache.cloudstack.api.command.user.vm.DeployVMCmd for job-3832
2022-01-12 17:14:30,663 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-24:ctx-3fefa540 job-3832) (logid:c0e092ad) Remove job-3832 from job monitoring

edikevich commented 2 years ago

Can I ask you? When we dedicated Pod something change in host_pod_ref?

DaanHoogland commented 2 years ago

Can I ask you? When we dedicated Pod something change in host_pod_ref?

I don't think it does.

edikevich commented 2 years ago

Where in DB I can see dedication?

DaanHoogland commented 2 years ago

in the above you have

2022-01-12 17:14:30,449 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-24:ctx-3fefa540 job-3832 ctx-befc08f1) (logid:c0e092ad) Deploy avoids pods: [2], clusters: [], hosts: []

which basically means it's going to avoid both your pods I have not discovered why.

DaanHoogland commented 2 years ago

why do you want to see it in the DB? (i'd have to query, but what is the use?)

edikevich commented 2 years ago

MB I can see some anomaly there:)

DaanHoogland commented 2 years ago

there is a table 'dedicated_resources'

edikevich commented 2 years ago

DaanHoogland commented 2 years ago

ah, in my case it does contain the account id, that might be the culprit

edikevich commented 2 years ago

But in one domain we can have many accounts

DaanHoogland commented 2 years ago

yes. i'll try that as well

DaanHoogland commented 2 years ago

I could still deploy in the pod if it was only dedicated to the domain.

edikevich commented 2 years ago

What I should check now? Maybe you need some more logs?

edikevich commented 2 years ago

I found the answer and I think it should not work like this:) When user deploys the VM he must select in advanced mode Affinity Group whith this domain! BUT! I think that Compute Offerings with Deployment planner equal ImplicitDedicationPlanner with strict policy must deploy VM in dedicated resources without this affinity group

DaanHoogland commented 2 years ago

Hi @edikevich , I'm not sure if I follow. Are you saying this is not an issue?

edikevich commented 2 years ago

I don't know:) If user from domain try do deploy instance with CO with Deployment planner equal ImplicitDedicationPlanner with strict policy and DON'T choose Affinity Group deploy is failed. If user from domain try do deploy instance with CO with Deployment planner equal FirstFitPlanner and choose Affinity Group deploy is passed. If user from domain try do deploy instance with CO with Deployment planner equal FirstFitPlanner and DON'T choose Affinity Group deploy is passed.

I think main thing there is "choose Affinity Group". For me, VM with CO with Deployment planner equal ImplicitDedicationPlanner with strict policy shoud deploy on dedicated resources and ignoring Affinity Group (it doesn't matter whether it is chosen or not).

DaanHoogland commented 2 years ago

I finally reproduced the issue @edikevich \o/, looking for a fix.

DaanHoogland commented 2 years ago

@edikevich I did a deep code dive and think I need to move this to 4.17, if at all we will change this behaviour.

The reason:

What if there are two dedicated PODs or even two dedicated resource pools, maybe even at different levels? This is not something that allows for the implicit choice of an affinity group on an explicitly dedicated resource group. We wouldn't know if both are valid or even if either is valid.
The behaviour of the strict mode is to strictly deploy on the resources that belong to the affinity group and if no one is known no deployment is possible. These two points combined make your expectations hard to achieve.

Let's think on how to address your need.

One possibility would be in the UI; make sure that the planner is retrieved and it is scanned for the requirement of a dedication, and force the user to pick one.
another one is in the backend, to scan dedicated resources for this user and allow all if none is choosen
a third is to make a default choice.

In any case I am moving this to 4.17, sorry.

edikevich commented 2 years ago

HI @DaanHoogland Can I ask you to look at one thing? Root Admin can't migrate VM between host in dedicated Pod. I think is very useful feature for host maintenance.

DaanHoogland commented 2 years ago

@edikevich I don;t understand, please describe in more detail in a new ticket?

edikevich commented 2 years ago

@DaanHoogland I think better in this becouse it related to this issue I think.

Dedicate Pod. Pod contains cluster with 3 hosts for example. Root Admin wants to put one host into maintenance mode (hardware or software issue). Root Admin wants migrate all VMs from "broken" host to some specific host. Root Admin can't migrate VM because VM Migrate Visard is empty. Root Admin can only place host into maitenance mode and then VMs migrate to random hosts.

If you want I can create new issue:)

nvazquez commented 2 years ago

Hi @edikevich @DaanHoogland sorry I tried following up but I'm still not clear - in case there are 2 different issues please create a new ticket for the migration issue. Is the migration issue related to the original issue?

Hudratronium commented 2 years ago

@edikevich I did a deep code dive and think I need to move this to 4.17, if at all we will change this behaviour.

The reason:

1. What if there are two dedicated PODs or even two dedicated resource pools, maybe even at different levels? This is not something that allows for the implicit choice of an affinity group on an explicitly dedicated resource group. We wouldn't know if both are valid or even if either is valid.

2. The behaviour of the strict mode is to strictly deploy on the resources that belong to the affinity group and if no one is known no deployment is possible.
   These two points combined make your expectations hard to achieve.

Let's think on how to address your need.

* One possibility would be in the UI; make sure that the planner is retrieved and it is scanned for the requirement of a dedication, and force the user to pick one.

* another one is in the backend, to scan dedicated resources for this user and allow all if none is choosen

* a third is to make a default choice.

In any case I am moving this to 4.17, sorry.

Just as a quick thought reading through this: Maybe the whole topic of "dedication" of ressources, at least on host level, could be "solved" with the usage of the tagging feature for this purposes? Currently we are using this to determine and find suitable hosts for VM-Deployment. Why not concider "ownership" or something similar as another "characteristic" of the host which can be changed on the fly? I guess we would have all the logic for deployment-decisions already in the backend... At least that could be a workaround for a usecase where a "admin" is predefining a Service Oferings aiming at a "special" host-group which the user should not be able to change.

DaanHoogland commented 1 year ago

@edikevich @Hudratronium with one weeks before code freeze I have no clear idea on what to do on this yet. I will move this to "unplanned". Please let's define the required functionality on this subject clearly.

Hudratronium commented 1 year ago

As far as i understand, @edikevich would expect that if host is stictly dedicated to a specific domain a account within that domain should automatically make use of this ressource without choosing the affinity group during deployment.

I was only suggesting, that it might be usefull to shift from the usage of affinity groups for realization of the whole dedication strictness to the 'tagging' functionalities. Which are quiet flexible to handel.

Without going into to much trouble, the intended behaviour from @edikevich might be achieved without the "dedicattion and strictness" but with tagging and service offerings makeing use of the tags.

DaanHoogland commented 1 year ago

@edikevich We are postponing this yet again as requirements are not clear yet. cc @alexandremattioli @rajujith

edikevich commented 1 year ago

Hi all! As @Hudratronium sad above I expect that if host/cluster is dedicated to a domain a account within that domain should automatically make use of this host/cluster without choosing the affinity group.

rajujith commented 1 year ago

@edikevich does VM deployment using the implicit dedication strict offering succeed when you choose the affinity group?

edikevich commented 1 year ago

@rajujith As I correctly remember - yes, VM are deployed in dedicated host/cluster

rajujith commented 1 year ago

@edikevich I believe what you are observing is expected behavior as per the current design. I believe this is your scenario:

Zone with multiple pods At least one pod is shared but it is running VMs( all hosts) belonging to other domain/s. Compute offering uses strict Implicit Dedication. DeployVirtualMachine API call doesn't have an affinity group

In this scenario, the dedicated pod won't be selected for deployment since the related affinity group is selected subsequently the 'strict Implicit Dedication' avoid the shared pod since it has VMs belonging to other domain/s resulting in no remaining host to deploy in the given zone. To the best of my knowledge, this VM deployment error in this scenario is expected. You could use the affinity group for VM deployment to avoid this failure. CC @DaanHoogland

Note: My verifications are based on CS v 4.18.

Explicit Dedication Design

New Admin APIs to dedicate Zones/Pods/Clusters/Hosts to a domain or account (these APIs will come in a separate plugin)
list affinity types: Add a new type: explicit dedication
Affinity group of type ExplicitDedication is created when the root admin dedicates a resource to any  account/domain.
User can associate above affinity group to VM during deployment.

[1]https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dedicated+Resources+-+Private+zone%2C+pod%2C+cluster%2C+host+Functional+Spec

Hudratronium commented 1 year ago

I don't know:) If user from domain try do deploy instance with CO with Deployment planner equal ImplicitDedicationPlanner with strict policy and DON'T choose Affinity Group deploy is failed. If user from domain try do deploy instance with CO with Deployment planner equal FirstFitPlanner and choose Affinity Group deploy is passed. If user from domain try do deploy instance with CO with Deployment planner equal FirstFitPlanner and DON'T choose Affinity Group deploy is passed.

I think main thing there is "choose Affinity Group". For me, VM with CO with Deployment planner equal ImplicitDedicationPlanner with strict policy shoud deploy on dedicated resources and ignoring Affinity Group (it doesn't matter whether it is chosen or not).

As @edikevich wrote above that the problem is not, that he can't deploy anything at all. From my understanding the problem is, that a user who wants to deploy a vm needs to actively select the correct affinity group.

The intention when creating a service offering with: "Deployment Planner = Implicit dedication & Planner mode = strict" seems to be that when using this service offering it is only deployed on dedicated hosts. everything else shall not be possible. From this intention the question: So why a user of the serviceoffering needs to select the correct affinity group while deploying a vm? The decision on "which" is the correct affinity group (at least for deployment) is already made. One would expect, that this affinity group is the only availeable one and therefore afaik 'standard' / preselected.

Looking in the docs one can get a bit confused regarding the objective of the options:

Planner Mode: Used when ImplicitDedicationPlanner is selected in the previous field. The planner mode determines how VMs will be deployed on private infrastructure that is dedicated to a single domain or account.

Strict: A host will not be shared across multiple accounts. For example, strict implicit dedication is useful for deployment of certain types of applications, such as desktops, where no host can be shared between different accounts without violating the desktop software’s terms of license.

The point is ... "deployed on private infrastructure that is dedicated to a single domain or account". Make it with an example: A host is dedicated to a domain. multiple accounts can make use of the host. now we choose strictness to deploy only on hosts where NO other accounts currently have some vms running.

So strictness is all about "one account alone is using a host" - but that is not directly strictness to the affinity group we defined prior with the dedication. Based on the 'docs' one could also 'mix': a host dedicated for a domain but strictness is aiming twords accounts. so you might end up in a situation where you won't find a "empty" host for usage within the affinity group.....

apache / cloudstack