Closed drewhemm closed 4 years ago
Ref #220 #221
Can you provide a few more details about your VPC setup?
I am using CNI custom networking because we have a limited number of IP addresses in the 10.50.112.0/24 CIDR as that subnet is mapped into our corporate IP space. The 10.50.112.x/26 CIDRS are used by the nodes and the 10.9.0.x/18 IPs are used by the pods.
In the instance metadata, I am attaching a secondary network interface as follows:
#!/bin/bash
set -o xtrace
# Secondary ENI for the underlay, code lifted from
# https://stackoverflow.com/questions/19836854/aws-cloudformation-networkinterfaces-in-autoscaling-launchconfig-group
export AWS_DEFAULT_REGION=eu-west-1
# Get the instance ID
INSTANCE_ID=$(curl -sS http://169.254.169.254/latest/meta-data/instance-id)
# And the AZ
AZ=$(curl -sS http://169.254.169.254/latest/meta-data/placement/availability-zone)
# Find the matching underlay subnet for this AZ
SUBNET_ID=$(aws ec2 describe-subnets --subnet-ids {{ underlay_subnets | join(" ")}} --filters Name=availabilityZone,Values=$AZ --query 'Subnets[0].SubnetId' --output text)
# Create a new network interface
ENI_ID=$(aws ec2 create-network-interface --subnet $SUBNET_ID --description 'Secondary ENI' --groups ${UnderlaySecurityGroup} --query 'NetworkInterface.NetworkInterfaceId' --output text)
# and tag it...
aws ec2 create-tags --resources $ENI_ID --tags 'Key=Foo,Value=Bar'
# Disable source dest check
# aws ec2 modify-network-interface-attribute --network-interface-id $ENI_ID --no-source-dest-check --output text
# Attach the interface to the instance
ATTACHMENT_ID=$(aws ec2 attach-network-interface --network-interface-id $ENI_ID --instance-id $INSTANCE_ID --device-index 1 --output text)
# Set the interface to delete upon instance termination
aws ec2 modify-network-interface-attribute --network-interface-id $ENI_ID --attachment AttachmentId=$ATTACHMENT_ID,DeleteOnTermination=true --output text
I am using Ansible and Jinja2 templating to generate the CloudFormation template, hence why there are some {{ variables }}
in the code.
Could it be due to hitting a throttle on the metadata service?
https://serverfault.com/questions/774552/aws-ec2-instance-metadata-service-fails-to-respond
I suspect the curl requests are coming back empty. I'll be able to prove it once I deploy a custom AMI that echoes out the responses...
In the event that a call to the metadata service fails (I recall now that I faced this problem before on an unrelated AWS project a few years back), what would be the ideal thing to do? Exit the script with exit 1
or retry the call up to x times with exponential backoff and exit 1
if it reaches x retries and still fails?
I've found a process that iterates through EC2 instances using k8s:
apiVersion: apps/v1
kind: Deployment
metadata:
name: clusterdns-test
namespace: default
spec:
replicas: 100
selector:
matchLabels:
app: clusterdns-test
template:
metadata:
labels:
app: clusterdns-test
spec:
nodeSelector:
kubernetes.io/role: test
containers:
- name: test
command:
- sleep
- "3600"
image: busybox
readinessProbe:
exec:
command:
- grep
- "172.20.0.10"
- /etc/resolv.conf
failureThreshold: 1
initialDelaySeconds: 10
timeoutSeconds: 2
resources:
requests:
cpu: "1500m" # used to ensure no more than one pod gets scheduled onto a t3.small instance, faster scheduling than using podAntiAffinity
tolerations:
- operator: Exists
The readinessProbe is used to identify pods on nodes with the incorrect clusterDNS
value. So far, I have been able to observe one faulty instance in over 600 EC2 instances. My real-world observation rate was much higher.
It's possible this would be quicker if I were to do some checking on the node using a script in the userdata rather than relying on k8s; will look into that tomorrow...
First instance I spun up today had the wrong clusterDNS
. Interestingly, none of the conditionals I put in my custom bootstrap.sh failed:
ZONE=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
if [ -z "$ZONE" ]; then
echo "ZONE is empty"
exit 1
fi
...
MAC=$(curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/ | head -n 1 | sed 's/\/$//')
if [ -z "$MAC" ]; then
echo "MAC is empty"
exit 1
fi
CIDRS=$(curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/vpc-ipv4-cidr-blocks)
if [ -z "$CIDRS" ]; then
echo "CIDRS is empty"
exit 1
fi
...
INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
if [ -z "$INTERNAL_IP" ]; then
echo "INTERNAL_IP is empty"
exit 1
fi
INSTANCE_TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type)
if [ -z "$INSTANCE_TYPE" ]; then
echo "INSTANCE_TYPE is empty"
exit 1
fi
This would suggest that the curl to get the CIDRS is not failing, or if it is failing, it is returning a non-empty value. Still more debugging required...
If I can't find a solution for this today, I'll have to run a patched AMI that forces the DNS_CLUSTER_IP to 172.20.0.10.
At last!
I found the best way to find instances with incorrect clusterDNS was to add the following to the instance userdata:
# Get the instance ID
INSTANCE_ID=$(curl -sS http://169.254.169.254/latest/meta-data/instance-id)
...
# Check the clusterDNS
if [[ ! $(grep 172.20.0.10 /etc/kubernetes/kubelet/kubelet-config.json) ]]; then
curl -sSX POST -H 'Content-type: application/json' --data "{'text':'Bad clusterDNS on $INSTANCE_ID'}" https://hooks.slack.com/services/############################
fi
This sends me a Slack notification for any instance where the clusterDNS is not as it should be. Error rate was approximately 1-2 per 50 instances.
The problem is caused by a 404 being returned by the curl to get the CIDRS. This results in the following output:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>404 - Not Found</title>
</head>
<body>
<h1>404 - Not Found</h1>
</body>
</html>
The call to get the MAC addresses is somehow clashing with the adding of the secondary interface in the userdata (a requirement for having separate IP space for worker nodes and pods using custom CNI networking), and instead of returning the MAC address for eth0 (or eth0 and eth1) it sometimes only returns the MAC for eth1, which sometimes has not yet been assigned an IP address (I confirmed this by adding echo "MAC: $MAC"
and echo "CIDRS: $CIDRS"
to bootstrap.sh
to see what is returned by the corresponding curl requests).
If the goal is to get the eth0 MAC address (which the head -n 1
would suggest), it would be more reliable to get it from cat /sys/class/net/eth0/address
. It doesn't make much sense to go the API for information that is statically available inside the instance.
If ever the need arises to retrieve multiple MAC addresses, this can be done with cat /sys/class/net/*/address
or cat /sys/class/net/eth*/address
.
I will open a new PR...
A workaround is to run bootstrap.sh
before adding the secondary interface, but that doesn't actually fix the issue.
Great find, thanks for investigating this issue. This might cause issues for the CNI as well...
This was resolved in #226
What happened: Potentially related to https://github.com/awslabs/amazon-eks-ami/issues/78 and https://github.com/awslabs/amazon-eks-ami/issues/197.
/etc/kubernetes/kubelet/kubelet-config.json
has the wrong value for clusterDNS, which is set by bootstrap.sh. This results in pods being unable to resolve DNS names.What you expected to happen: I would expect to see the correct nameserver in
/etc/kubernetes/kubelet/kubelet-config.json
and in/etc/resolv.conf
on all pods running in the cluster.How to reproduce it (as minimally and precisely as possible): Create EKS service. My VPC happens to have two subnets (10.50.112.0/26 and 10.9.0.0/18, not sure that is relevant). Create worker instances. Most have the correct nameserver values in
/etc/kubernetes/kubelet/kubelet-config.json
, but not all. For the worker nodes that have the wrong value, pods cannot resolve DNS queries.Anything else we need to know?:
Environment:
uname -a
): Linux 4.14.104-95.84.amzn2.x86_64cat /tmp/release
on a node):This issue still exists with the latest AMI code. An instance I created last week from an AMI built from the master branch (https://github.com/awslabs/amazon-eks-ami/commit/6090f200669ba1f76ce68f23e6496b3df9bc588a) has the wrong nameserver address in
/etc/kubernetes/kubelet/kubelet-config.json
:The instance is deployed into a VPC with two subnet CIDRs: 10.50.112.0/26 and 10.9.0.0/18.
It does not happen all the time. Other instances created around the same time and from the same AMI have the correct value. It is an intermittent bug that I have encountered on numerous occasions but am yet to find the exact cause.