Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 303 forks source link

Unable to connect to the server: net/http: TLS handshake timeout #14

Closed davidjsanders closed 5 years ago

davidjsanders commented 6 years ago

Hi, when I create an AKS cluster, I'm receiving a timeout on the TLS handshake. The cluster creates okay with the following commands:

az group create --name dsK8S --location westus2

az aks create \
  --resource-group dsK8S \
  --name dsK8SCluster \
  --generate-ssh-keys \
  --dns-name-prefix dasanderk8 \
  --kubernetes-version 1.8.1 \
  --agent-count 2 \
  --agent-vm-size Standard_A2

az aks get-credentials --resource-group dsK8S --name dsK8SCluster

The response from the create command is a JSON object: { "id": "/subscriptions/OBFUSCATED/resourcegroups/dsK8S/providers/Microsoft.ContainerService/managedClusters/dsK8SCluster", "location": "westus2", "name": "dsK8SCluster", "properties": { "accessProfiles": { "clusterAdmin": { "kubeConfig": "OBFUSCATED" }, "clusterUser": { "kubeConfig": "OBFUSCATED" } }, "agentPoolProfiles": [ { "count": 2, "dnsPrefix": null, "fqdn": null, "name": "agentpool1", "osDiskSizeGb": null, "osType": "Linux", "ports": null, "storageProfile": "ManagedDisks", "vmSize": "Standard_A2", "vnetSubnetId": null } ], "dnsPrefix": "dasanderk8", "fqdn": "dasanderk8-d55f0987.hcp.westus2.azmk8s.io", "kubernetesVersion": "1.8.1", "linuxProfile": { "adminUsername": "azureuser", "ssh": { "publicKeys": [ { "keyData": "OBFUSCATED" } ] } }, "provisioningState": "Succeeded", "servicePrincipalProfile": { "clientId": "OBFUSCATED", "keyVaultSecretRef": null, "secret": null } }, "resourceGroup": "dsK8S", "tags": null, "type": "Microsoft.ContainerService/ManagedClusters" }

I've now torn down this cluster but this has happened three times today.

Any help?

David

brendandburns commented 6 years ago

Please send the details of your cluster (subscription, resource group, cluster name) in via Azure support or to me (bburns [at] microsoft).

The TLS disconnect indicates a problem with the API Server, which can have multiple different causes. The oncall team can investigate and mitigate the problems if you send the relevant information.

Thanks --brendan

drewsmith commented 6 years ago

+1

Having this issue as well and I can't log a support request without subscribing to the $100/mo plan.

JunSun17 commented 6 years ago

@dsandersAzure @drewsmith I have mitigated your issues and it should work now.

ziXet commented 6 years ago

We are running AKS(westus2) on production. Yesterday (friday at 22:00 GMT) the kube API stopped working. we are getting "Unable to connect to the server: net/http: TLS handshake timeout". The agents are working properly. So we do not have downtime. What should we do? our kubernetes version 1.8.1.

ziXet commented 6 years ago

@brendandburns Can you please also take a look at my case?

brendandburns commented 6 years ago

Can you send your subscription, resource group and cluster name to my Microsoft email?

Thanks --brendan


From: Amir Reza Ghods notifications@github.com Sent: Saturday, March 10, 2018 1:54:44 PM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

@brendandburnshttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbrendandburns&data=04%7C01%7Cbburns%40microsoft.com%7C29a612b5a17443fd01d308d586d18924%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563156864347711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=O8LOsAWYGirNKxCcQXkvBtnn1toH3h982Mre9w1YMSk%3D&reserved=0 Can you please also take a look at my case?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-372070228&data=04%7C01%7Cbburns%40microsoft.com%7C29a612b5a17443fd01d308d586d18924%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563156864347711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=k0D32jyQZbz274UjQh3rPjNlOQo4GCS%2BuxmXwpNV8AA%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDgmnbA_1vY0d3vygXTqSN-MjtFqIEks5tdEukgaJpZM4QL11k&data=04%7C01%7Cbburns%40microsoft.com%7C29a612b5a17443fd01d308d586d18924%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563156864347711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=nLIgBXpOY3tWGYmsHEVZc3v9OESSD6dNvenZtM96ICE%3D&reserved=0.

novitoll commented 6 years ago

+1 Same stuff. It's been few hours since it's started. Can not access to pods, kubectl proxy does not work as well.

ziXet commented 6 years ago

@brendandburns I already contacted support and they are investigating the issue. I hope I could get it back since it's our production! I'm sure this incident(loosing kube API) is not limited to us, do you have any further update to share with us?

Karreg commented 6 years ago

Pretty frustrating to see this issue not solved.

@ziXet that was a bit (very) audacious to go to production on a system that is in preview state :open_mouth:

ziXet commented 6 years ago

@Karreg I only had a kubernetes 1.7 option on ACS the time I was releasing. But I wanted 1.8 which was only available in Preview. To be honest I never expected this kind of massive failure even in Preview. For example many people are using Azure Managed Postgres even though it's still in Preview.

Karreg commented 6 years ago

The issue of "previews" that take ages to go live...

tslavik commented 6 years ago

I had the same issue. Than tried to upgrade cluster and finished with this error

Deployment failed. Correlation ID: c69f4d4e-87b7-4136-9a9d-b05c93670515. Timeout while polling for control plane provisioning status

What is interesting, my cluster now is available again through "kubectl proxy --port=XXXX" When I look into portal I see kubernetes version 1.9.1 but "kubectl get nodes" return 1.8.1 and at portal I have error message "This container service is in a failed state"

ziXet commented 6 years ago

I did not run the upgrade command. I thought it may make things worse.. I'm still waiting for support...

tslavik commented 6 years ago

I understand. It depends on what is happened behind. For me it is test environment. Fortunately production is still ok. May be it will be helpful to investigate the issue.

SurushS commented 6 years ago

This issue caused me to have to create a new cluster last week. Now after a week my new cluster is suddenly not responding to kubectl create apply or delete commands and the az aks show command says my new cluster has a failed provision state. So again I’m disappointed in the state of aks even in preview. Now today trying to create a new cluster and it’s still creating since this morning in two different subscriptions. I’m at a loss for words....

brendandburns commented 6 years ago

Many apologies. We are having an ongoing issue in some production environments related to cluster creation. Our oncall engineers are working to restore service.

I'll keep this thread updated as we have more information.

--brendan


From: Surush notifications@github.com Sent: Monday, March 12, 2018 7:55:34 AM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

This issue caused me to have to create a new cluster last week. Now after a week my new cluster is suddenly not responding to kubectl create apply or delete commands and the az aks show command says my new cluster has a failed provision state. So again I’m disappointed in the state of aks even in preview. Now today trying to create a new cluster and it’s still creating since this morning in two different subscriptions. I’m at a loss for words....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-372338326&data=04%7C01%7Cbburns%40microsoft.com%7Ce6afd05b5e0e4614c65e08d588294f56%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636564633362284596%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=ntzpwtAv7larXKe810dHOZxWx%2FtyhZK5t%2Btgq5mPlCs%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDgn4FovNNz6svYWL8B82rJcaZFYziks5tdoxlgaJpZM4QL11k&data=04%7C01%7Cbburns%40microsoft.com%7Ce6afd05b5e0e4614c65e08d588294f56%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636564633362284596%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=GFsXWYNhotWCyL8lUWZAZDKiZxn2%2F6IaSksClveFwDs%3D&reserved=0.

alonisser commented 6 years ago

@brendandburns I believe updating your status page https://azure.microsoft.com/en-us/status/ would have been helpful and would lead less noise to here

SurushS commented 6 years ago

Any update on th matter ? I still have Yep clusters in Creating State that also cannot be deleted. Hesitant to try to make new cluster since all there impact resource cap limits

brendandburns commented 6 years ago

You can delete the underlying resource group MC_* to free resources.

--brendan


From: Surush notifications@github.com Sent: Monday, March 12, 2018 11:51:46 PM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Any update on th matter ? I still have Yep clusters in Creating State that also cannot be deleted. Hesitant to try to make new cluster since all there impact resource cap limits

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-372564479&data=04%7C01%7Cbburns%40microsoft.com%7Cf60c4ca2f7f9408c339708d588aee42f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636565207090576275%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=lUo7Bu%2Bu5Onkv5S37bCYUynJvXjL7sZFZ%2BBeYdsyZvE%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDgp4KLPsmZVN_gL0m4-ta1OFWGTH6ks5td2yCgaJpZM4QL11k&data=04%7C01%7Cbburns%40microsoft.com%7Cf60c4ca2f7f9408c339708d588aee42f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636565207090576275%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=sLuQcjkIAzhHFqEFP4PUazuP3kVxEXzor9%2BkspMXp2M%3D&reserved=0.

ziXet commented 6 years ago

@brendandburns I did not hear back from the support!! Do you mind if I send you the subscription ID? At least I need an answer if it's not recoverable. I have to create a new cluster and migrate everything manually in that case.

brendandburns commented 6 years ago

Sure, my email is earlier in this thread.

--brendan


From: Amir Reza Ghods notifications@github.com Sent: Tuesday, March 13, 2018 1:16:02 PM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

@brendandburnshttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbrendandburns&data=04%7C01%7Cbburns%40microsoft.com%7C7babaf2ae2e946bf9a9808d5891f3ed6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636565689645388602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=yheGeZsNkvD%2BAWM3SdlaiL%2BJysFZPEboM8ehIVl4Mn4%3D&reserved=0 I did not hear back from the support!! Do you mind if I send you the subscription ID? At least I need an answer if it's not recoverable. I have to create a new cluster and migrate everything manually in that case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-372803407&data=04%7C01%7Cbburns%40microsoft.com%7C7babaf2ae2e946bf9a9808d5891f3ed6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636565689645388602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=znPjWknpXpB2yt3Lr7i1%2BtlEaxcoIeKQ5lCyLhA1PHc%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDgmOULm57e6B-btFR_fu1VRub2SXHks5teCkCgaJpZM4QL11k&data=04%7C01%7Cbburns%40microsoft.com%7C7babaf2ae2e946bf9a9808d5891f3ed6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636565689645388602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=hyaCNvd442qFCe6ha6rvTWD10IvjPFo1WeYwDhcEiAo%3D&reserved=0.

SurushS commented 6 years ago

@brendandburns. Is it okay if I send you an e-mail. I have some questions regarding kubernetes options in general and ACS , AKS and so so forth. We are currently looking at our options and whether we want to use Azure or AWS for our PaaS. Hopefully you can give me some insights I need to make a proper decision. Promise it won’t be a long dreadful message. Please let me know if that’s okay.

ziXet commented 6 years ago

Hi @brendandburns, Do you have any update ?

brendandburns commented 6 years ago

Did you send cluster details via email? If so I didn't see it, please resend.

--brendan


From: Amir Reza Ghods notifications@github.com Sent: Thursday, March 15, 2018 12:21:58 PM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Hi @brendandburnshttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbrendandburns&data=04%7C01%7Cbburns%40microsoft.com%7Cf8127664f8ec45dd364b08d58aaa05e8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636567385204594568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=%2FnPRXSKHUPsflPMH8orDkpgmra38X1ts7819cg4FzzA%3D&reserved=0, Do you have any update ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-373493673&data=04%7C01%7Cbburns%40microsoft.com%7Cf8127664f8ec45dd364b08d58aaa05e8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636567385204594568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=MS6IICkhoYfKqL9gS8aNmiE2375TnhBZhnz02MBKhmw%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDgv_4TQFXqH30Hch5qszdDRhDo53Fks5ter9WgaJpZM4QL11k&data=04%7C01%7Cbburns%40microsoft.com%7Cf8127664f8ec45dd364b08d58aaa05e8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636567385204594568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=oVvAnLhz%2F5w6OZxNuliG5gyoWfaORFFuI3RZ7YSQm1o%3D&reserved=0.

ziXet commented 6 years ago

@brendandburns I just resent it.

ziXet commented 6 years ago

I gave up! Azure Support and Brendan did not respond! Already moved everything to ACS. AKS sucks at the moment. (might change my mind later)

tslavik commented 6 years ago

We are interested in AKS. I thought that GA of AKS should be in 1Q2018. Could someone confirm or exlude?

ziXet commented 6 years ago

@tslavik Speaking of GA, postgresql supposed to have GA in November 2017 but it's still in Preview!

giorgited commented 6 years ago

@ziXet have fun with ACS... we were having so much issues with that.. Azure Support replied once a week with not so much helpful emails so we just gave up. We are trying AKS now, and yet i did have the TLS handshake timeout issue, it seems like it comes and goes for me.

brendandburns commented 6 years ago

Giorgi, please feel free to email us when you see issues bburns [at] microsoft, with subscription, resource group and cluster name.

We're trying to track down the remaining issues that cause TLS timeouts, and more data is always useful.

Thanks --brendan


From: Giorgi Tediashvili notifications@github.com Sent: Monday, March 26, 2018 9:04 PM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

@ziXethttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FziXet&data=02%7C01%7Cbburns%40microsoft.com%7C9461f242060e404708e208d59397caa9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636577202507416125&sdata=bkQvfmYUDjlNGCTjMrCOLyoXsbmSbECP%2BxhsviqCgBw%3D&reserved=0 have fun with ACS... we were having so much issues with that.. Azure Support replied once a week with not so much helpful emails so we just gave up. We are trying AKS now, and yet i did have the TLS handshake timeout issue, it seems like it comes and goes for me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-376390192&data=02%7C01%7Cbburns%40microsoft.com%7C9461f242060e404708e208d59397caa9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636577202507416125&sdata=5C39MGfJPveGqe%2BzRyq5LWJf%2FU%2FbNGvdSlmF%2FBd%2FIAA%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDglQkWggwcqD6rqPnfwIaDB1u0bIkks5tibo4gaJpZM4QL11k&data=02%7C01%7Cbburns%40microsoft.com%7C9461f242060e404708e208d59397caa9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636577202507416125&sdata=Tmcm8E09EUmV5M7Mm9GXULB3%2BNlw%2BSbWqJzDmICxOlI%3D&reserved=0.

jfelten commented 6 years ago

Here's another data point for you:

I am able to reproduce this issue on a bare metal cluster running on CentOS 7 using 1.10.0 on a new node managed via kubeadm. I had never seen this issue prior to 1.10.0. The issue appears to be that the api server port opens before it is ready to process TLS traffic. Waiting 10 seconds or so makes the problem go away, but is a pain for automation scripts. It would be helpful if the api server would not accept traffic before it is ready.

kvolkovich-sc commented 6 years ago

Have similar issue right now.

kubectl get pods

Unable to connect to the server: net/http: TLS handshake timeout

giorgited commented 6 years ago

Any update on this?

brendandburns commented 6 years ago

This is an umbrella symptom caused by many different issues. We continue to fix them, but others crop up.

Please don't take the fact that these issues continue as a sign that we aren't finding and fixing problems (we are), but we're keeping this open for people to report new problems as they occur.

As before please send details to Azure support or to me directly (bburns at Microsoft) and we will have oncall investigate the cause.

Thanks! Brendan

--brendan


From: Giorgi Tediashvili notifications@github.com Sent: Monday, April 2, 2018 7:54:45 AM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Any update on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-377947428&data=02%7C01%7Cbburns%40microsoft.com%7C5e13f3cb07ca4910bed608d598a9ace7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636582776872660869&sdata=vd43ntBLv31OAYHKyMrUzYsAAjC%2BwYQdNrs%2FtCGpgSQ%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDgh4L906SlNZjEKQarrjWGCTBdwGhks5tkju0gaJpZM4QL11k&data=02%7C01%7Cbburns%40microsoft.com%7C5e13f3cb07ca4910bed608d598a9ace7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636582776872660869&sdata=nhpME93fdfIt2Eqw73vcQJYyQdrGv5nLmX%2B3CcHLgDo%3D&reserved=0.

alexandruradovici commented 6 years ago

Hello,

We have an AKS service in the EastUS region. We are not able to connect to it using kubectl, is there any issue in this region?

az aks get-credentials --resource-group [resource] --name [service name]
kubectl get nodes

we get

Unable to connect to the server: net/http TLS handshake timeout

What would you advise us to do to be able to connect?

polys commented 6 years ago

Same issue with an AKS cluster in East US: Unable to connect to the server: net/http: TLS handshake timeout

It only started happening today; everything was working just fine yesterday.

I tried connecting from my own machine, a docker container and the Azure Cloud Shell... all unsuccessful. I guess the issue is server side...

brendandburns commented 6 years ago

Alexandru can you send me your information?


From: Polys Georgiou notifications@github.com Sent: Friday, April 6, 2018 5:56 AM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Same issue for an AKS cluster in East US: Unable to connect to the server: net/http: TLS handshake timeout

It only started happening today; everything was working just fine yesterday.

I tried connecting from my own machine, a docker container and the Azure Cloud Shell... all unsuccessful. I guess the issue is server side...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-379244753&data=02%7C01%7Cbburns%40microsoft.com%7Cfb0206b5043c42f851f608d59bbdc10c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586256611164697&sdata=79XH3olLkS1FA78ssQGrflAl82YfcbTPmtpdW%2FWr0xs%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDguzxpwTPLssiLFE8W1JE8O2hPrPlks5tl2XigaJpZM4QL11k&data=02%7C01%7Cbburns%40microsoft.com%7Cfb0206b5043c42f851f608d59bbdc10c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586256611164697&sdata=SCt2q3h5t5n8xtKfC5J6R9zlpFgCzKDWUjv3CKhmsxk%3D&reserved=0.

alexandruradovici commented 6 years ago

Brendan, what exact information do you need? The resource group is ibot-studio-production and the AKS service is ibot-studio-production-service.

On Fri, Apr 6, 2018 at 7:35 PM, Brendan Burns notifications@github.com wrote:

Alexandru can you send me your information?


From: Polys Georgiou notifications@github.com Sent: Friday, April 6, 2018 5:56 AM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Same issue for an AKS cluster in East US: Unable to connect to the server: net/http: TLS handshake timeout

It only started happening today; everything was working just fine yesterday.

I tried connecting from my own machine, a docker container and the Azure Cloud Shell... all unsuccessful. I guess the issue is server side...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks. protection.outlook.com/?url=https%3A%2F%2Fgithub.com% 2FAzure%2FAKS%2Fissues%2F14%23issuecomment-379244753&data= 02%7C01%7Cbburns%40microsoft.com%7Cfb0206b5043c42f851f608d59bbdc10c% 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586256611164697&sdata= 79XH3olLkS1FA78ssQGrflAl82YfcbTPmtpdW%2FWr0xs%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url= https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth% 2FAFfDguzxpwTPLssiLFE8W1JE8O2hPrPlks5tl2XigaJpZM4QL11k&data= 02%7C01%7Cbburns%40microsoft.com%7Cfb0206b5043c42f851f608d59bbdc10c% 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586256611164697&sdata= SCt2q3h5t5n8xtKfC5J6R9zlpFgCzKDWUjv3CKhmsxk%3D&reserved=0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/AKS/issues/14#issuecomment-379307946, or mute the thread https://github.com/notifications/unsubscribe-auth/ACtOoYwRhIHEM85HPC3-_NBA39mnVa9dks5tl5lPgaJpZM4QL11k .

-- Alexandru RADOVICI Universitatea "Politehnica" din Bucuresti

e-mail: alex@ipworkshop.ro telefon: 0742061223 www.ipworkshop.ro

brendandburns commented 6 years ago

Can you send me the subscription id and region too? send to bburns [at] microsoft if you prefer.

Thanks.


From: Alexandru Radovici notifications@github.com Sent: Friday, April 6, 2018 10:18 AM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Brendan, what exact information do you need? The resource group is ibot-studio-production and the AKS service is ibot-studio-production-service.

On Fri, Apr 6, 2018 at 7:35 PM, Brendan Burns notifications@github.com wrote:

Alexandru can you send me your information?


From: Polys Georgiou notifications@github.com Sent: Friday, April 6, 2018 5:56 AM To: Azure/AKS Cc: Brendan Burns; Mention Subject: Re: [Azure/AKS] Unable to connect to the server: net/http: TLS handshake timeout (#14)

Same issue for an AKS cluster in East US: Unable to connect to the server: net/http: TLS handshake timeout

It only started happening today; everything was working just fine yesterday.

I tried connecting from my own machine, a docker container and the Azure Cloud Shell... all unsuccessful. I guess the issue is server side...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks. protection.outlook.com/?url=https%3A%2F%2Fgithub.com% 2FAzure%2FAKS%2Fissues%2F14%23issuecomment-379244753&data= 02%7C01%7Cbburns%40microsoft.com%7Cfb0206b5043c42f851f608d59bbdc10c% 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586256611164697&sdata= 79XH3olLkS1FA78ssQGrflAl82YfcbTPmtpdW%2FWr0xs%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url= https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth% 2FAFfDguzxpwTPLssiLFE8W1JE8O2hPrPlks5tl2XigaJpZM4QL11k&data= 02%7C01%7Cbburns%40microsoft.com%7Cfb0206b5043c42f851f608d59bbdc10c% 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586256611164697&sdata= SCt2q3h5t5n8xtKfC5J6R9zlpFgCzKDWUjv3CKhmsxk%3D&reserved=0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/AKS/issues/14#issuecomment-379307946, or mute the thread https://github.com/notifications/unsubscribe-auth/ACtOoYwRhIHEM85HPC3-_NBA39mnVa9dks5tl5lPgaJpZM4QL11k .

-- Alexandru RADOVICI Universitatea "Politehnica" din Bucuresti

e-mail: alex@ipworkshop.ro telefon: 0742061223 www.ipworkshop.ro

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F14%23issuecomment-379318993&data=02%7C01%7Cbburns%40microsoft.com%7C588b83addfc04800139908d59be2685a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586319071412261&sdata=9sd4enNxg%2Fy%2FyVfeSOhL9Id3F4VH9S4xrXt3grzy%2BVY%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFfDggPR_s4CF4KCVdp1yrxDOCaBB1HZks5tl6NZgaJpZM4QL11k&data=02%7C01%7Cbburns%40microsoft.com%7C588b83addfc04800139908d59be2685a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636586319071422269&sdata=jnFE8QVVr7%2Fa6NkJPMd10axLHKFmvY6IsuCc316eZss%3D&reserved=0.

amitshowry commented 6 years ago

AKS in East US getting : net/http: TLS handshake timeout error Deployment is reverting automatically to previous state, Can see some pods getting killed and recreated with old configurations and images, this happened 4th time in the span of 1 month.

kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T11:52:23Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.7", GitCommit:"8e1552342355496b62754e61ad5f802a0f3f1fa7", GitTreeState:"clean", BuildDate:"2017-09-28T23:56:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Unfortunately AKS support is very poor and unhelpful.

shrutir25 commented 6 years ago

@amitshowry - I have resolved the issue on your cluster through the AKS support case. A couple of pods were stuck in terminating, amongst other things. I have summarized it in the case notes. Feel free to ping me if the issue reoccurs.

c-mccutcheon commented 6 years ago

AKS in West Europe, just started getting this issue about 15 minutes ago. All fine until then, we were testing a container which was stuck in a termination/backoff loop but to my knowledge this shouldn't bring the cluster management itself down. Containers within the cluster appear to still be operating (as one of our services is still up and running, though we cannot get to see it).

It just appears the proxying/management services are unable to connect, same issue - TLS handshake failure

amitshowry commented 6 years ago

@shrutir25 - as you mentioned in notes it was because of bad etcd in cluster and issue was fixed by one of your engineer, but automatic reverting of deployments / services to old configurations keep happening every day.

As AKS is in perview, Unfortunately support is slow, poor and unhelpful.

angrox commented 6 years ago

We have the same issue in WEU with one of our clusters. Is there a chance to get via ssh on the machines when deploying an ssh forward pod is not possible?

qmfrederik commented 6 years ago

@brendandburns Here's another case with TLS handshake failures: 118041117980513

tomasaschan commented 6 years ago

@brendandburns I'm also seing this at the moment, AKS in EU West, any command that actually talks to the cluster seems to fail (even kubectl version). Is there anything I can do about it, or do I just have to wait for the tech staff @ Azure to fix/work around it? Any info I can give you that would help you troubleshoot?

Update a couple of days later: most commands work now, but kubectl logs <pod> still doesn't... :/

hokusp commented 6 years ago

@brendandburns same problem here EU West. It started happening after I upgraded to 1.9.6

gourlaa commented 6 years ago

Same problem, I just open a ticket.

aevitas commented 6 years ago

This issue keeps popping up intermittently for me, rendering AKS virtually unusable in a production scenario.

my3sons commented 6 years ago

Looks like our US East cluster is down with this today. Anyone else experiencing same issue there?