Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 560 forks source link

acs-engine scale up/down issue with loadbalancer OF SKU: STANDARD #4328

Closed mo-saeed closed 5 years ago

mo-saeed commented 5 years ago

Is this a request for help?:

yes Is this an ISSUE or FEATURE REQUEST? (choose one):

Issue

What version of acs-engine?:

v0.26.0


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) Kuberenetes 12.1

What happened: Here's the

FATA[0819] Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details." Details=[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"VMExtensionProvisioningError\",\r\n \"message\": \"VM has reported a failure when processing extension 'vmssCSE'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=50\\n[stdout]\\n\\n[stderr]\\n\\\".\"\r\n }\r\n ]\r\n }\r\n}"}] .

exit status 50 based on the document means: outbound traffic failure.

the new node is created successfully but without kubernetes installed and it didn't join the loadbalancer backend pool.

Without loadbalancer scaling up and down works perfectly !

What you expected to happen: Scaling up/down should be succeeded in the existence of the loadbalancer.

How to reproduce it (as minimally and precisely as possible): As described above

Anything else we need to know:

mo-saeed commented 5 years ago

Just to mention this issue happens only with Loadbalancer of SKU standard: "loadBalancerSku": "standard"

vijaygos commented 5 years ago

I have tried this with ACS 0.24.2 and Kubernetes 1.12.1 and did not run into this problem. I must also add that I excluded the Master from the Standard LoadBalancer SKU but I am not certain if that has any impact here. I was actually able to scale to over 100+ nodes with SinglePlacementGroup set to False and also was able to successfully deploy applications after scaling.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.

palmerabollo commented 5 years ago

I am facing the same issue with aks-engine 0.32.x and kubernetes 1.11.8. Cluster autoscaler v1.3.9 Also using "loadBalancerSku": "standard" and "singlePlacementGroup": false

This is the event I see in a kubernetes service that expects a load balancer to be created:

Warning CreatingLoadBalancerFailed 4m41s service-controller Error creating load balancer (will retry): failed to ensure load balancer for service mynamespace/traefik: Long running operation terminated with status 'Failed': Code="VMExtensionProvisioningError" Message="VM has reported a failure when processing extension 'vmssCSE'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=50\n[stdout]\n\n[stderr]\n\"."

I think the issue should be reopened.

@mo-saeed did you find a workaround until it's fixed?