Open mfkuntz opened 5 years ago
Hostname in Start-EKSBootstrap.ps1 script should represent internal private DNS name of your EC2 instance. You can verify that by running Invoke-RestMethod 'http://169.254.169.254/latest/meta-data/local-hostname' . This should return private DNS name (This is what Get-EC2Metadata call returns in Register-KubernetesServices). You shouldn't hardcode the Hostname as above. Let us know if you need additional information.
I agree, I am not sure where the difference in the host name comes from.
The host name returned from the metadata endpoint does not contain the ec2.internal
suffix, so I guess I will scan through the kubelet args and code to see where that comes from.
There is an interesting thing though: In the AWS console UI, the private dns does have the correct suffix, but the meta data endpoint does not. I'll look into why that is as well.
The VPC does have the dns resolution and hostnames flags set.
edit; yeah, it looks like it uses the cloud controller manager to lookup the nodename from the API directly, not from the instance endpoint.
Could be related to https://github.com/kubernetes/kubernetes/issues/11543, need to dig more
I agree, I am not sure where the difference in the host name comes from.
The host name returned from the metadata endpoint does not contain the
ec2.internal
suffix, so I guess I will scan through the kubelet args and code to see where that comes from.There is an interesting thing though: In the AWS console UI, the private dns does have the correct suffix, but the meta data endpoint does not. I'll look into why that is as well.
The VPC does have the dns resolution and hostnames flags set.
edit; yeah, it looks like it uses the cloud controller manager to lookup the nodename from the API directly, not from the instance endpoint.
Could be related to kubernetes/kubernetes#11543, need to dig more
Interesting. can you run ipconfig /all on this instance and send it to me?
Couold this be related? https://github.com/aws/containers-roadmap/issues/236 I had DNS resolution issues because of this
Couold this be related? #236 I had DNS resolution issues because of this The above mentioned DNS issue can occur when accessing kubernetes services inside pod. In your case, metadata service doesn't return the fully qualified name. These are two different ones.
Sorry for the big delay in responding, but I am back on the project now.
Seems that having kube-proxy lookup the IP with a different method is a design choice: https://github.com/kubernetes/kubernetes/issues/71851
ipconfig /all
Windows IP Configuration
Host Name . . . . . . . . . . . . : QA-EKSW-0d955d1
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : us-east-1.ec2-utilities.amazonaws.com
us-west-2.ec2-utilities.amazonaws.com
us-west-2.compute.internal
Ethernet adapter vEthernet (Ethernet 3):
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #2
Physical Address. . . . . . . . . : 12-12-16-14-BD-56
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::3049:d3a:fbad:3f5c%16(Preferred)
IPv4 Address. . . . . . . . . . . : 10.20.202.92(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.192.0
Lease Obtained. . . . . . . . . . : Wednesday, July 17, 2019 5:25:08 PM
Lease Expires . . . . . . . . . . : Wednesday, July 17, 2019 6:55:10 PM
Default Gateway . . . . . . . . . : 10.20.192.1
DHCP Server . . . . . . . . . . . : 10.20.192.1
DHCPv6 IAID . . . . . . . . . . . : 269619734
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-24-C1-11-63-0E-64-B2-6D-B6-CC
DNS Servers . . . . . . . . . . . : 10.20.0.2
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter vEthernet (nat):
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
Physical Address. . . . . . . . . : 00-15-5D-C6-8D-C6
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::18f:f87c:c681:6158%11(Preferred)
IPv4 Address. . . . . . . . . . . : 172.31.208.1(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.240.0
Default Gateway . . . . . . . . . :
DHCPv6 IAID . . . . . . . . . . . : 184554845
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-24-C1-11-63-0E-64-B2-6D-B6-CC
DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
fec0:0:0:ffff::2%1
fec0:0:0:ffff::3%1
NetBIOS over Tcpip. . . . . . . . : Enabled
There seems to be an issue with kube-proxy on windows containers receiving the wrong Hostname Override parameter from the bootstrap script. Looking at kube-proxy logs, it fails to lookup the ip (from the kubernetes api) for the node and falls back to 127.0.0.1, causing issues.
From there there are errors creating the proxy endpoints. The nodes do appear to work, until you create a deployment with a service and more then 1 replica. With a single replica, everything works, but adding a second (or more) all pods fail.
I can see what causes the error, and have a patch for my specific cluster configuration, but I am not sure why it is necessary for this cluster vs others not having the issues. One guess - this is an older cluster that has been upgraded from 1.10 all the way to current 1.11
Start-EKSBootstrap.ps1
has a$HostName
variable from calling the metadata endpoint. The value of that, for an example node, isip-10-20-66-170
. That value is then passed as an arg to kubelet and kube-proxy. The issue comes from the nodes actual nameip-10-20-66-170.ec2.internal
.That value is then used by kube-proxy to lookup the IP here. Using
ip-10-20-66-170
returns nil, because that lookup is by name. Usingip-10-20-66-170.ec2.internal
works as expected, returning the correct id.As a quick patch, I added this script block to patch the Bootstrap script to use the "correct" hostname for kube-proxy
I didn't see the repo for the AMI, so I wasn't sure if there was more documentation for the Bootstrap that would help explain some of this.
AMI -
ami-032bdf5292844295a
EKS -1.11
-eks.2
- Updated from 1.10 to 1.11 Kubelet -v1.11.5
Kube-Proxy -v1.11.5
aws-node -amazon-k8s-cni:v1.3.3
Example windows node: