Open hari1822 opened 1 month ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Hey @hari1822 , my team faced this issue before resulting in 30K Autoscale VMs created.
Our findings was that this happens when the VM is unable to be created sucessfully from the VM Template. But, Cloudstack tries really hard to start up a VM, so it keeps retrying on loop, forever.
I created a ticket here reporting the issue, suggesting better handling to avoid infinite loops: https://github.com/apache/cloudstack/issues/9318
Anyways, in our case, what was causing the issue was either:
We were using Linstor as the SDS Storage, but Linbit managed to resolve the issue for us and we were able to create Autoscale VMs without issues ever since.
What stroage are you using?
When this problem arises, we are unable to delete or disable the auto scale group, and the scaling occurs within the given interval.
@hari1822 , okay thats new for us.
When we encountered this issue, we were able to disable the Autoscale Group.
We then did either 1 of the 2 options:
But note, in Option 1, we encountered the UI crashing a few times and DB going 100%.
In this Option 2, we felt it was okay because the VM itself was not created yet. Just a record of its attempt.
I have 2 Questions
NFS- is used for storage.
While trying to delete the Autoscale Group : Failed to remove the load balancer rule. If we try to delete the load balancer rule : Unable to remove the loadbalancer rule.
2024-08-07 12:39:48,283 DEBUG [o.a.c.n.t.BasicNetworkTopology] (API-Job-Executor-23:ctx-7e771bbf job-7704 ctx-cd0c46cb) (logid:4ed65a97) Router r-817-VM is in Stopped, so not sending apply ip association commands to the backend
2024-08-07 12:39:48,292 DEBUG [o.a.c.n.t.BasicNetworkTopology] (API-Job-Executor-23:ctx-7e771bbf job-7704 ctx-cd0c46cb) (logid:4ed65a97) APPLYING LOAD BALANCING RULES
2024-08-07 12:39:48,293 DEBUG [o.a.c.n.t.BasicNetworkTopology] (API-Job-Executor-23:ctx-7e771bbf job-7704 ctx-cd0c46cb) (logid:4ed65a97) Router r-817-VM is in Stopped, so not sending apply loadbalancing rules commands to the backend
2024-08-08 00:03:19,922 DEBUG [c.c.h.x.r.CitrixResourceBase] (DirectAgent-139:ctx-669704e5) (logid:6bee0f59) Trying to connect to
169.254.233.95 attempt 68 of 100
2024-08-08 00:03:20,817 ERROR [c.c.u.s.SshHelper] (DirectAgent-298:ctx-a647b11e) (logid:cd57d95b) SSH execution of command /opt/cl
oud/bin/router_proxy.sh update_config.py 169.254.0.49 vm_dhcp_entry.json.48497187-69e7-48cd-b6a7-f3214d3d0015 has an error status
code in return. Result output:
2024-08-08 00:03:20,818 DEBUG [c.c.a.r.v.VirtualRoutingResource] (DirectAgent-298:ctx-a647b11e) (logid:cd57d95b) Processing Script
ConfigItem, executing update_config.py vm_dhcp_entry.json.48497187-69e7-48cd-b6a7-f3214d3d0015 took 7114ms
2024-08-08 00:03:20,818 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-298:ctx-a647b11e) (logid:cd57d95b) Seq 1-42665976969801610
48: Response Received:
2024-08-08 00:03:20,818 DEBUG [c.c.a.t.Request] (DirectAgent-298:ctx-a647b11e) (logid:cd57d95b) Seq 1-4266597696980161048: Process
ing: { Ans: , MgmtId: 275890944841813, via: 1(wolfapp2-xen), Ver: v1, Flags: 10, [{"com.cloud.agent.api.routing.GroupAnswer":{"re
sults":["null - failed: ","null - failed: "],"result":"false","wait":"0","bypassHostMaintenance":"false"}}] }
2024-08-08 00:03:20,818 DEBUG [c.c.a.t.Request] (Work-Job-Executor-112:ctx-d48372fe job-7705/job-8169 ctx-772376bc) (logid:cd57d95
b) Seq 1-4266597696980161048: Received: { Ans: , MgmtId: 275890944841813, via: 1(wolfapp2-xen), Ver: v1, Flags: 10, { GroupAnswer
} }
2024-08-08 00:03:20,818 WARN [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-112:ctx-d48372fe job-7705/job-8169 ctx-772376bc
) (logid:cd57d95b) Unable to contact resource.
com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable: Unable to apply dhcp entry on router
@hari1822 i see
im not any good at reading logs, but it looks like your Virtual Router is stopped? If so i think thats a bigger issue you should look into first.
Do you have other VPCs and Virtual Routers working okay?
And for Autoscale, remember you need to disable the autoscale group first before being able to delete it.
If you try to delete an Autoscale Group that is still enabled, it will throw and error.
Only when an Autoscale Group is disabled can you delete it, or make changes to load balancer etc
@btzq Will look into it
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
While creating autoscale in the parameter field when we mention the maximum number of instance as 3 the instance is created over the mentioned value. The created instance are in the state of stopped and some are in the error state. Nearly 1000 instance are created. This issue occurs when the created instance are in stopped or in error state.
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS