Closed davidjsanders closed 5 years ago
Update 11/03: I'm now able to create clusters successfully in uswest2; however, I'm still getting TLS handshake errors:
az aks browse --resource-group *OBFUSCATED* --name *OBFUSCATED*
Merged "*OBFUSCATED*" as current context in /tmp/tmpB988cA
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection refused
Are we still in the realm of capacity issues or is there another underlying issue here? This should work, right?
David
Sometime I should look before I write :)
I see the problem. The proxy is trying to connect to 10.240.0.4 which is the private IP of one of the agents and won't (and shouldn't) be reachable from the Internet. I'm guessing this is the underlying issue here.
+1 originally this worked fine, I noticed the isse today when I deleted the cluster and tried to recreate it.
I get this regardless of using West US 2 or UK West: ~ amazaheri$ az aks browse -n mtcirvk8s -g mtcirvacs-rg Merged "mtcirvk8s" as current context in /var/folders/sf/p87ql6z9271_1l7cp6hgt2d40000gp/T/tmpHZ_Er0 Proxy running on http://127.0.0.1:8001/ Press CTRL+C to close the tunnel... error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection refused
Looks like we are good now, thanks for all the work! QQ: I cannot connect with Cabin app to my cluster using token. The app shows cluster as running but I can see any of the nodes, namespaces, etc. looks like the auth fails at some point. Thoughts? https://github.com/bitnami/cabin/issues/75
I'm having the same problem in West US 2 at the moment:
$ kubectl get pods --all-namespaces Unable to connect to the server: net/http: TLS handshake timeout
same issue here on West US 2
The cluster aks is in West US 2. I have the same issue.
kubectl get nodes Unable to connect to the server: net/http: TLS handshake timeout
az aks browse --resource-group xxxx-rg --name xxxx Merged "XXXX" as current context in /tmp/tmpx6o89zj7 Unable to connect to the server: net/http: TLS handshake timeout Command '['kubectl', 'get', 'pods', '--namespace', 'kube-system', '--output', 'name', '--selector', 'k8s-app=kubernetes-dashboard']' returned non-zero exit status 1.
11/9: I'm still getting issues and have reverted back to unmanaged cluster using ACS and Kubernetes as the controller. Look forward to when AKS becomes a little more stable,.
I am having these same issues!
@dsandersAzure I did the same, I created using ACS !!
AKS still in preview, for now, it seems west us 2 is not available, but ukwest is ok. We can create aks in ukwest now.
C:\Users\jason>az group create --name akss --location ukwest
{
"id": "/subscriptions/xxxxxxx-222b-49c3-xxxx-xxxxx1e29a7b15/resourceGroups/akss",
"location": "ukwest",
"managedBy": null,
"name": "akss",
"properties": {
"provisioningState": "Succeeded"
},
"tags": null
}
C:\Users\jason>az aks create --resource-group akss --name myK8sCluster --agent-count 1 --generate-ssh-keys
{| Finished ..
"id": "/subscriptions/xxxxxxxx-222b-49c3-xxxx-0361e29axxxx/resourcegroups/akss/providers/Microsoft.ContainerService/managedClusters/myK8sCluster",
"location": "ukwest",
"name": "myK8sCluster",
"properties": {
"accessProfiles": {
"clusterAdmin": {
"kubeConfig": "YXBpVmVyc2lvbjogdjEKY2x1c3RlcnM6Ci0gY2x1c3RlcjoKICAgIGNlcnRpZmljYXRlLWF1dGhvcml0eS1kYXRhOiBMUzB0TFMxQ1JVZEpUaUJEUlZKVVNVWkpRMEZVUlMwdExTMHRDazFKU1VWNGVrTkRRWEVyWjBGM1NVSkJaMGxSWlhVMGVXRnBOekp3TlhadmNsUjRha2hMTldReGVrRk9RbWRyY1docmFVYzVkekJDUVZGelJrRkVRVTRLVFZGemQwTlJXVVJXVVZGRVJYZEthbGxVUVdWR2R6QjRUbnBGZUUxVVFYZE5WRlV4VFdwS1lVWjNNSGhQVkVWNFRWUkJkMDFVVlRGTmFrcGhUVUV3ZUFwRGVrRktRbWRPVmtKQlRWUkJiVTVvVFVsSlEwbHFRVTVDWjJ0eGFHdHBSemwzTUVKQlVVVkdRVUZQUTBGbk9FRk5TVWxEUTJkTFEwRm5SVUZ6TlRCRENsaGFNSEJCZWtJdlYxWnRjR1ZZTkhwaFRtZzVXRFJIVjIxWWFHTnpaelIyZVRWVGQxaDNVVTB2U1dkMWRGbGFVRzFUTjFCelVUUXJZazluWkZCWGVXSUtaREp6YWxSclJsVXZPRzVMYzJzM0sxcHhPRmxWTURFMFpVWkJXamx2UlRWNUsyRmhLMlZ
I believe capacity issues in ukwest is ongoing, hoping AKS will expand to other locations in Europe soon. Had a 1.7.7 cluster in ukwest that broke a couple of days ago. Attempted to recreate today, but it is still in a bad state.
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
heapster-b5ff6c4dd-dkkll 2/2 Running 0 46m
kube-dns-v20-6c8f7f988b-cb4cg 3/3 Running 0 46m
kube-dns-v20-6c8f7f988b-ztn5r 3/3 Running 0 46m
kube-proxy-thz9p 1/1 Running 0 46m
kube-svc-redirect-qhwz6 0/1 CrashLoopBackOff 13 46m
kubernetes-dashboard-7f7d9489fc-d9x7d 0/1 CrashLoopBackOff 12 46m
tunnelfront-xzjq8 0/1 CrashLoopBackOff 13 46m
$ kubectl logs kube-svc-redirect-qhwz6 -n kube-system
Error from server: Get https://aks-agentpool1-28161470-0:10250/containerLogs/kube-system/kube-svc-redirect-qhwz6/redirector: dial tcp 10.240.0.4:10250: getsockopt: connection refused
So, provisioning in westuk gives me a cluster with crashing pods; provisioning in westus2 doesn't work at all:
Azure Container Service is unable to provision an AKS cluster in westus2, due to an operational threshold. Please try again later or use an alternate location. For more details please refer to: https://github.com/Azure/AKS/blob/master/preview_regions.md.
Hi,
Same here today, I created an aks 1.8.1 on westeurope and it's ok, but one hour later I upgraded to 1.8.2 and since
Unable to connect to the server: net/http: TLS handshake timeout
kubectl 1.8.0 and 1.8.4 same error.
After that I cant create new aks on westeurope location cli return this
cmd :
az aks create -n saceaks -g saceaks --location westeurope --kubernetes-version 1.8.1 --node-vm-size=Standard_DS1_V2 --node-count=2
cli error
Exception in thread AzureOperationPoller(b39cfa6a-a15e-49e4-9684-9cff4a0b579b):
Traceback (most recent call last):
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_operation.py", line 377, in _start
self._poll(update_cmd)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_operation.py", line 464, in _poll
raise OperationFailed("Operation failed or cancelled")
msrestazure.azure_operation.OperationFailed: Operation failed or cancelled
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/az/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/opt/az/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_operation.py", line 388, in _start
self._exception = CloudError(self._response)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 148, in __init__
self._build_error_data(response)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 164, in _build_error_data
self.error = self.deserializer('CloudErrorRoot', response).error
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 992, in __call__
value = self.deserialize_data(raw_value, attr_desc['type'])
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 1143, in deserialize_data
return self(obj_type, data)
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 998, in __call__
return self._instantiate_model(response, d_attrs)
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 1090, in _instantiate_model
response_obj = response(**kwargs)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 59, in __init__
self.message = kwargs.get('message')
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 105, in message
value = eval(value)
File "<string>", line 1, in <module>
NameError: name 'resources' is not defined
{
"id": null,
"location": null,
"name": "e0ecdbcf-dffd-6b43-81fa-85f6517448a6",
"properties": null,
"tags": null,
"type": null
}
Having the same issue. I have two clusters, One East US and other Central US,
The central US works fine but when I switch context to East US, it gives the error
Unable to connect to the server: net/http: TLS handshake timeout
I'm having the same issue after downscaling my cluster in East US!
Hi everyone,
Having same issue today on westeurope. And when I try to create a new cluster in this location, it gives an error:
Deployment failed. Correlation ID: <id>. Azure Container Service is unable to provision an AKS cluster in westeurope, due to an operational threshold. Please try again later or use an alternate location. For more details please refer to: https://github.com/Azure/AKS/blob/master/preview_regions.md.
Still an issue. Any resolution? This is my third running cluster I have lost the ability to communicate with, in East US. Doing an upgrade or scaling up the nodes does not work properly - a complete deal breaker when considering AKS. Either of these commands results in Unable to connect to the server: net/http: TLS handshake timeout
. I've tried numerous commands, restarting nodes, etc. Nothing seems to recover the cluster access.
Command to create:
az aks create `
--name AKS-Cluster-VoterDemo `
--resource-group RG-EastUS-AKS-VoterDemo `
--node-count 1 `
--generate-ssh-keys `
--kubernetes-version 1.8.2
Perfectly healthy.
Command to scale up:
az aks scale `
--name AKS-Cluster-VoterDemo `
--resource-group RG-EastUS-AKS-VoterDemo `
--node-count 3
Result: Unable to connect to the server: net/http: TLS handshake timeout
I encounter the same TLS handshake timeout connection issue after I manually scale the node count from 1 to 2! My cluster is in Central US What's wrong?
Thanks for your patience through our preview.
We've had a few bugs in scale and upgrade paths that prevented the api-server from passing its health check after upgrade and/or scale. A number of bug fixes in this area went out over the last few weeks that have made upgrades more reliable.
Last week, for clusters in East US, we had an operational issue that impacted a number of older customer clusters between 12/11 13:00PST and 12/12 16:01PST.
Health and liveness of the api-server is now much better. If you haven't upgraded recently I'd recommend issuing az aks upgrade
, even to the same kubernetes-version, as that will push the latest configuration to clusters. This rollout step is currently being automated and should be transparent in the future.
@slack thank you it work ;)
@slack Confirm upgrading the cluster to 1.8.2 get the Kubectl connect again
@slack Having the same problem still after upgrading to 1.8.2 in westeurope. Is there a problem in that region?
After downgrading to 2.0.23 i was able to install the cluster but after getting the credentials downloaded I also have the same problem in westeurope...
kubectl get nodes Unable to connect to the server: net/http: TLS handshake timeout
doing an az aks upgrade to 1.8.2 failed for me too incidentally.
running into the same issue, cluster in West Europe, upgrade to 1.8.2 fails with: Deployment failed. Correlation ID: 858d3cf0-0d4e-417d-a2ee-22f627892e51. Operation failed with status: 200. Details: Resource state Failed
@jakobli the error message you get I got when I hit my CPU quota. Are you sure you have extra D2 CPUs available? If I'm not wrong, aks commissions new vms before putting down the old ones
@aleksen Thanks for the tip, but I checked we have loads of quota left for D2.
So tried deploying directly with version 1.8.2 Deploy with without any issues but kubectl get nodes still gets Unable to connect to the server: net/http: TLS handshake timeout
The bug has been closed on github but the issue is still there. None of the proposed fix work in westeurope. Can someone reopen this issue? It's pending for a month and there's no resolution 😞
I'm seeing the same issue too after deploying a new cluster (v1.7.7) in westeurope
:
$ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
Hi, same issue here. I just created a cluster using AKS in westeurope reagion and i am unable to connect to it. kubectl get no
Unable to connect to the server: net/http: TLS handshake timeout
Are someone looking actively into this?
Interesting. I had the same problem with cluster created yesterday. An hour before I deleted an old and created a new one (v1.8.1 - westeurope) using the same service principal and it works.
Problem definitely still exists. Hitting it with a v1.8.6 cluster in eastus. Have seen it across numerous versions, and I can no longer justify simply "recreating a cluster" as a workaround.
$ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
Can we please get some visibility into the actual status of properly fixing this? Very hard to justify using AKS for production when we don't have any idea when it is going to fail. Worse, there are sufficient functional differences between ACS and AKS to make swapping back and forth a non-starter as well...
deployed AKS in Azure region west-Europe I have the same problem: .\kubectl get nodes Unable to connect to the server: net/http: TLS handshake timeout
Having an issue suddenly when trying to use the CLI.
kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
https://azure.microsoft.com/en-us/status/ status seems to be ok location westeurope
Yes, The same behavior. I delete and recreate the cluster and it works fine now.
@rafnijs recreate the cluster? imho that solution seems quite a bit radical to me
az login sometimes solves the problem, seems to be a random issue
@tiborb Completely agree. The only "solution" that has been touted is delete and recreate. Which would be fine if there was an easy way to transfer everything, but...
(1) AKS Master API being non-responsive means I can't even dump out the current cluster config, so no easy "migration" and surely no way to know when it would randomly fail.
(2) AKS uses a managed resource group, which if I delete the existing cluster ALSO GETS DELETED. That means any Managed Disks that were oh so pleasantly created for me get lost on cluster delete unless I manually migrate EACH ONE before hand.
Its been about 5 days for me now in this state, where access returns intermittently. For those same 5 days I have had a support ticket open with Azure and have gotten basically nothing back except "I will check with the dev team". Its been nearly 48 hours now since the last response from them, and I am thinking of just ditching the Azure based implementation as it clearly isn't ready for real world use.
I spoke with MS Azure Italia on phone 2 days ago and they pointed me out that AKS is still in preview. When a service is flagged like that, it is stil not production ready and the support they offer isn't active. For the moment the best way to spin up a kubernetes cluster on azure is ACS with orchestrator set to kubernetes
. We use it in production since 1 year and it never has any problem at all.
My orchestrator is: Kubernetes version V1.8.1
I suppose it should be stable.
It is not a kubernetes problem, even if it seems like that. It is a AKS that is not handling correctly cluster creation and networking. As i said, just use ACS with kubernetes instead of AKS and it will work like a charm.
You can find ACS CLI commands here: https://docs.microsoft.com/en-us/cli/azure/acs?view=azure-cli-latest
az acs create -g MyResourceGroup -n MyContainerService --orchestrator-type kubernetes --generate-ssh-keys
I have had AKS running on Westeurope for about 4 months now running 1.8.1. Somehow this morning lost all connectivity with all the containers running and the dashboard through az aks browse. After doing an az aks upgrade to 1.8.7 everything is running normal again. It's a shame I don't know what caused this issue. Hopefully this stuff doesn't happen when it goes GA.
hello guys, i have the same problem. this problem occur on windows 10 pro, docker ce edge 18.02.0-ce-win52(15372) when adding new services, pods to the cluster. after increasing memory for docker process problem solved temporarily. is it possible to add some checks(available memory, cpu, etc ) in adding new pods, deployments, services to the cluster?
Also happening on my cluster in EUW
Same for me - does not work anymore in EUW
Same for me, suddenly stopped connecting with the TLS error, west europe 1.8.7
Same here, cannot connect to my aks cluster anymore. Tried using PowerShell Azure CLI and Bash - Ubuntu - Azure CLI.
Managed to resolve the issue by running the below:
az aks upgrade --resource-group removed --name removed --kubernetes-version 1.8.1
It is worth noting that I upgraded to the same version my cluster was already on, 1.8.1 -> 1.8.1
xxx:~$ kubectl config current-context companyname xxx:~$ kubectl proxy Starting to serve on 127.0.0.1:8001 I0308 18:02:14.211741 18892 logs.go:41] http: proxy error: net/http: TLS handshake timeout xxx:~$ kubectl get nodes Unable to connect to the server: net/http: TLS handshake timeout
Hi, when I create an AKS cluster, I'm receiving a timeout on the TLS handshake. The cluster creates okay with the following commands:
The response from the create command is a JSON object: { "id": "/subscriptions/OBFUSCATED/resourcegroups/dsK8S/providers/Microsoft.ContainerService/managedClusters/dsK8SCluster", "location": "westus2", "name": "dsK8SCluster", "properties": { "accessProfiles": { "clusterAdmin": { "kubeConfig": "OBFUSCATED" }, "clusterUser": { "kubeConfig": "OBFUSCATED" } }, "agentPoolProfiles": [ { "count": 2, "dnsPrefix": null, "fqdn": null, "name": "agentpool1", "osDiskSizeGb": null, "osType": "Linux", "ports": null, "storageProfile": "ManagedDisks", "vmSize": "Standard_A2", "vnetSubnetId": null } ], "dnsPrefix": "dasanderk8", "fqdn": "dasanderk8-d55f0987.hcp.westus2.azmk8s.io", "kubernetesVersion": "1.8.1", "linuxProfile": { "adminUsername": "azureuser", "ssh": { "publicKeys": [ { "keyData": "OBFUSCATED" } ] } }, "provisioningState": "Succeeded", "servicePrincipalProfile": { "clientId": "OBFUSCATED", "keyVaultSecretRef": null, "secret": null } }, "resourceGroup": "dsK8S", "tags": null, "type": "Microsoft.ContainerService/ManagedClusters" }
I've now torn down this cluster but this has happened three times today.
Any help?
David