apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.97k stars 1.09k forks source link

XCP-NG VM/Kubernetes node dynamic scale failed. #8334

Open furyflash777 opened 9 months ago

furyflash777 commented 9 months ago
ISSUE TYPE
COMPONENT NAME
UI
CLOUDSTACK VERSION
4.18.0 and 4.18.1
CONFIGURATION
OS / ENVIRONMENT

XCP-NG 8.2.1

Current setup: cpu.overprovisioning.factor = 2 mem.overprovisioning.factor = 2

same result with: cpu.overprovisioning.factor = 1 mem.overprovisioning.factor = 1

test VM 1 CPU x 0.50 Ghz and 512 MB memory

image

SUMMARY

Unable to dynamic scale VM from different templates (Debian, Ubuntu, Centos)

GUI error: Unhandled exception

Log error: 2023-12-08 04:13:57,203 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-27:ctx-ff0ece9e job-1340/job-13 42 ctx-23ed9399) (logid:15e9aadd) Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeExcep tion: Unable to scale vm due to Catch exception com.cloud.utils.exception.CloudRuntimeException when scaling V M:i-8-254-VM due to com.cloud.utils.exception.CloudRuntimeException: Cannot scale up the vm because of memory constraint violation: 0 <= memory-static-min(268435456) <= memory-dynamic-min(2147483648) <= memory-dynamic-ma x(4294967296) <= memory-static-max(536870912)

Cannot scale up the vm because of memory constraint violation: 0 <= memory-static-min(268435456) <= memory-d ynamic-min(2147483648) <= memory-dynamic-max(4294967296) <= memory-static-max(536870912) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateReConfigureVm(VirtualMachineManagerImpl.java:4648 ) at com.cloud.vm.VirtualMachineManagerImpl.reConfigureVm(VirtualMachineManagerImpl.java:4579) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateReconfigure(VirtualMachineManagerImpl.java:5522) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav a:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5536) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.ja va:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java: 55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedCont ext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedConte xt.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

STEPS TO REPRODUCE
Create VM with 1 or more CPUs and select dynamic scalable in VM, Template, Global properties. 
Try to scale running VM.
EXPECTED RESULTS
VM scaled successfully
ACTUAL RESULTS
VM scale failed result
DaanHoogland commented 7 months ago

@furyflash777 , can you have a look at your report, please? something is off; the message Cannot scale up the vm because of memory constraint violation: is created in a class called CitrixResourceBase, which does not occur in your stack-trace. the message I would expect is "Unable to scale vm due to and I think it would be a valid response from the hypervisor.

Can you also explain what your intentions are? In kubernetes the number of nodes would scale, not the size of the VM. Is this what your try?

furyflash777 commented 7 months ago

I am trying to scale VM size (change compute offering). Dynamic scaling also is not working for instances.

image

image

044: Sending { Cmd , MgmtId: 20156575640318, via: 6, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.ScaleVmCo mmand":{"vm":{"id":"1656","name":"i-6-1656-VM","type":"User","cpus":"4","minSpeed":"725","maxSpeed":"2900","mi nRam":"(4.00 GB) 4294967296","maxRam":"(8.00 GB) 8589934592","enableHA":"false","limitCpuUse":"false","enableD ynamicallyScaleVm":"false","uuid":"9ad5863b-d170-4d7b-8ea1-04f7efcc8fa0","enterHardwareSetup":"false","configD riveLocation":"SECONDARY","guestOsDetails":{},"extraConfig":{}},"vmName":"i-6-1656-VM","cpus":"4","minSpeed":" 725","maxSpeed":"2900","minRam":"(4.00 GB) 4294967296","maxRam":"(8.00 GB) 8589934592","wait":"0","bypassHostM aintenance":"false"}}] } 2024-01-27 03:11:00,466 DEBUG [c.c.a.t.Request] (AgentManager-Handler-14:null) (logid:) Seq 6-5214042468588252 044: Executing: { Cmd , MgmtId: 20156575640318, via: 6, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.ScaleV mCommand":{"vm":{"id":"1656","name":"i-6-1656-VM","type":"User","cpus":"4","minSpeed":"725","maxSpeed":"2900", "minRam":"(4.00 GB) 4294967296","maxRam":"(8.00 GB) 8589934592","enableHA":"false","limitCpuUse":"false","enab leDynamicallyScaleVm":"false","uuid":"9ad5863b-d170-4d7b-8ea1-04f7efcc8fa0","enterHardwareSetup":"false","conf igDriveLocation":"SECONDARY","guestOsDetails":{},"extraConfig":{}},"vmName":"i-6-1656-VM","cpus":"4","minSpeed ":"725","maxSpeed":"2900","minRam":"(4.00 GB) 4294967296","maxRam":"(8.00 GB) 8589934592","wait":"0","bypassHo stMaintenance":"false"}}] }

2024-01-27 03:11:00,535 DEBUG [c.c.h.x.r.w.x.CitrixScaleVmCommandWrapper] (DirectAgent-27:ctx-310bdcf2) (logid :d2eeb120) Catch exception com.cloud.utils.exception.CloudRuntimeException when scaling VM:i-6-1656-VM due to com.cloud.utils.exception.CloudRuntimeException: Cannot scale up the vm because of memory constraint violation : 0 <= memory-static-min(2147483648) <= memory-dynamic-min(4294967296) <= memory-dynamic-max(8589934592) <= me mory-static-max(4294967296)

It's very strange. In log I see "enableDynamicallyScaleVm":"false"

but it is enabled in VM, Template and Compute offering

image

Other logs:

2024-01-27 03:11:00,538 DEBUG [c.c.a.t.Request] (AgentManager-Handler-11:null) (logid:) Seq 6-5214042468588252 044: Processing: { Ans: , MgmtId: 20156575640318, via: 6, Ver: v1, Flags: 110, [{"com.cloud.agent.api.ScaleVm Answer":{"result":"false","details":"Catch exception com.cloud.utils.exception.CloudRuntimeException when scal ing VM:i-6-1656-VM due to com.cloud.utils.exception.CloudRuntimeException: Cannot scale up the vm because of m emory constraint violation: 0 <= memory-static-min(2147483648) <= memory-dynamic-min(4294967296) <= memory-dyn amic-max(8589934592) <= memory-static-max(4294967296)","wait":"0","bypassHostMaintenance":"false"}}] }

2024-01-27 03:11:00,539 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-103:ctx-636242c2 job-29223/ job-29224 ctx-7eebd461) (logid:d2eeb120) Unable to scale vm due to Catch exception com.cloud.utils.exception.C loudRuntimeException when scaling VM:i-6-1656-VM due to com.cloud.utils.exception.CloudRuntimeException: Canno t scale up the vm because of memory constraint violation: 0 <= memory-static-min(2147483648) <= memory-dynamic -min(4294967296) <= memory-dynamic-max(8589934592) <= memory-static-max(4294967296)

2024-01-27 03:11:00,552 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-103:ctx-636242c2 job-29223/job- 29224 ctx-7eebd461) (logid:d2eeb120) Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeEx ception: Unable to scale vm due to Catch exception com.cloud.utils.exception.CloudRuntimeException when scalin g VM:i-6-1656-VM due to com.cloud.utils.exception.CloudRuntimeException: Cannot scale up the vm because of mem ory constraint violation: 0 <= memory-static-min(2147483648) <= memory-dynamic-min(4294967296) <= memory-dynam ic-max(8589934592) <= memory-static-max(4294967296)

2024-01-27 03:11:00,552 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-103:ctx-636242c2 job-29223/job- 29224 ctx-7eebd461) (logid:d2eeb120) Rethrow exception com.cloud.utils.exception.CloudRuntimeException: Unable to scale vm due to Catch exception com.cloud.utils.exception.CloudRuntimeException when scaling VM:i-6-1656-V M due to com.cloud.utils.exception.CloudRuntimeException: Cannot scale up the vm because of memory constraint violation: 0 <= memory-static-min(2147483648) <= memory-dynamic-min(4294967296) <= memory-dynamic-max(85899345 92) <= memory-static-max(4294967296)

2024-01-27 03:11:00,553 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-103:ctx-636242c2 job-29223/job-29 224) (logid:d2eeb120) Unable to complete AsyncJobVO: {id:29224, userId: 2, accountId: 2, instanceType: null, i nstanceId: null, cmd: com.cloud.vm.VmWorkReconfigure, cmdInfo: xxxxxxxxxxx, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 20156575640318, completeMsid: null, lastUpdated: null, lastPolled: nul l, created: Sat Jan 27 03:10:59 MSK 2024, removed: null}, job origin:29223 com.cloud.utils.exception.CloudRuntimeException: Unable to scale vm due to Catch exception com.cloud.utils.exc eption.CloudRuntimeException when scaling VM:i-6-1656-VM due to com.cloud.utils.exception.CloudRuntimeExceptio n: Cannot scale up the vm because of memory constraint violation: 0 <= memory-static-min(2147483648) <= memory -dynamic-min(4294967296) <= memory-dynamic-max(8589934592) <= memory-static-max(4294967296) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateReConfigureVm(VirtualMachineManagerImpl.java:4648 ) at com.cloud.vm.VirtualMachineManagerImpl.reConfigureVm(VirtualMachineManagerImpl.java:4579) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateReconfigure(VirtualMachineManagerImpl.java:5522) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav a:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5536) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.ja va:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java: 55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedCont ext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedConte xt.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

2024-01-27 03:11:00,668 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-72:ctx-3d19bf6d job-29223) (logi d:d2eeb120) Unexpected exception while executing org.apache.cloudstack.api.command.user.kubernetes.cluster.Sca leKubernetesClusterCmd java.lang.RuntimeException: Unhandled exception at com.cloud.vm.VirtualMachineManagerImpl.reConfigureVm(VirtualMachineManagerImpl.java:4594) at com.cloud.vm.VirtualMachineManagerImpl.reConfigureVm(VirtualMachineManagerImpl.java:271) at com.cloud.vm.UserVmManagerImpl.upgradeRunningVirtualMachine(UserVmManagerImpl.java:2044) at com.cloud.vm.UserVmManagerImpl.upgradeVirtualMachine(UserVmManagerImpl.java:1900) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav a:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvoca tion.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.jav a:163) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor. java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.jav a:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) at com.sun.proxy.$Proxy185.upgradeVirtualMachine(Unknown Source) at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterOffer ing(KubernetesClusterScaleWorker.java:301) at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleCluster(KubernetesClus terScaleWorker.java:464) at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.scaleKubernetesCluster(KubernetesClusterM anagerImpl.java:1328) at org.apache.cloudstack.api.command.user.kubernetes.cluster.ScaleKubernetesClusterCmd.execute(ScaleKu bernetesClusterCmd.java:156) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:163) at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.ja va:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java: 55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedCont ext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedConte xt.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: com.cloud.utils.exception.CloudRuntimeException: Unable to scale vm due to Catch exception com.clou d.utils.exception.CloudRuntimeException when scaling VM:i-6-1656-VM due to com.cloud.utils.exception.CloudRunt imeException: Cannot scale up the vm because of memory constraint violation: 0 <= memory-static-min(2147483648 ) <= memory-dynamic-min(4294967296) <= memory-dynamic-max(8589934592) <= memory-static-max(4294967296) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateReConfigureVm(VirtualMachineManagerImpl.java:4648 ) at com.cloud.vm.VirtualMachineManagerImpl.reConfigureVm(VirtualMachineManagerImpl.java:4579) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateReconfigure(VirtualMachineManagerImpl.java:5522) ... 19 more

2024-01-27 03:11:00,674 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-72:ctx-3d19bf6d job-29223) ( logid:d2eeb120) Complete async job-29223, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.ap i.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Unhandled exception"}

furyflash777 commented 7 months ago

When k8 cluster has stopped state. I can change compute offering for cluster successfully. Seems problem with Dynamic scaling for XCP-NG VMs.

DaanHoogland commented 7 months ago

ok @furyflash777, get it. I wonder if you really want scaling up of machines as you can easily add and remove worker nodes. That said you may have hit a XCPNG limitation. So can you try if you can autoscale a separate VM on your setup?

furyflash777 commented 7 months ago

Dynamic scaling isn't working for any XCP-NG VMs started with Cloudstack.

furyflash777 commented 7 months ago

@DaanHoogland maybe I should set memory-dynamic-max, memory-dynamic-min in to VM settings or template.

I found https://cwiki.apache.org/confluence/display/CLOUDSTACK/Enable+additional+configuration+metadata+to+virtual+machines

but I don't understand what should I put in to VM settings.

i tried put in to platform and with separate metadata, but didn't changed.

image

DaanHoogland commented 7 months ago

maybe an obvious one, but do you have xentools installed in the vm?

furyflash777 commented 6 months ago

Yes I have