Init container not working when using node groupes on 2 different AZ

Lautreck commented 2 years ago

Hello, I've managed to make pods with Multus work with the Init Container and make many things with them on ONE AZ Now i'm testing pods with Multus on TWO different AZ.

After many tries i discovered that using the template given by the blog post on cloudformation to create nodegroups, only the first one created work with Init Container, the second nodegroup created on another AZ has init Container fail to connect to the endpoint.

in my case i work on eu-west-1, the VPC is on 10.0.0.0/16 as on the blog post https://aws.amazon.com/blogs/containers/amazon-eks-now-supports-multus-cni/

When i create my first nodegroup with the cloudformattion template either on AZ eu-west-1a or AZ eu-west-1b it work. the AZ has no consequences and their is no bug.

after creating a second nodegroup on a different AZ, then init container stop working. it's not my multus CRD at fault because i can make pods on any az as long as their's only one nodegroup.

reproduce the bug : create cluster => create nodegroupe 1 => create pods => create nodegroupe 2 on another AZ => create pods on AZ 2

Pods CRD

i use nodeSelector to be sure than my pods are created on the nods with the right network.

cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-1 labels: app: alpine spec: replicas: 3 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-1, ipvlan-multus-2 spec: initContainers:

name: aws-ip-mgmt image: xxxxx.ecr.aws/xxxxxx/aws-ip-manager-pb:latest imagePullPolicy: IfNotPresent args: [/bin/sh, -c, '/app/script.sh initContainers']
containers:
name: alpine command: ["sh", "-c", "trap : TERM INT; sleep infinity & wait"] image: public.ecr.aws/amazonlinux/amazonlinux:latest securityContext: privileged: true capabilities: add: ["NET_ADMIN", "SYS_TIME"] nodeSelector: disktype: ssd EOF

cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-2 labels: app: alpine spec: replicas: 2 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-2 spec: initContainers:

name: aws-ip-mgmt image: xxxxx.ecr.aws/xxxxxx/aws-ip-manager-pb:latest imagePullPolicy: IfNotPresent args: [/bin/sh, -c, '/app/script.sh initContainers']
containers:
name: alpine command: ["sh", "-c", "trap : TERM INT; sleep infinity & wait"] image: public.ecr.aws/amazonlinux/amazonlinux:latest securityContext: privileged: true capabilities: add: ["NET_ADMIN", "SYS_TIME"] nodeSelector: disktype: ssd EOF

cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-3 labels: app: alpine spec: replicas: 1 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-3 spec: initContainers:

name: aws-ip-mgmt image: xxxx.ecr.aws/xxxxx/aws-ip-manager-pb:latest imagePullPolicy: IfNotPresent args: [/bin/sh, -c, '/app/script.sh initContainers']
containers:
name: alpine command: ["sh", "-c", "trap : TERM INT; sleep infinity & wait"] image: public.ecr.aws/amazonlinux/amazonlinux:latest securityContext: privileged: true capabilities: add: ["NET_ADMIN", "SYS_TIME"] nodeSelector: disktype: hdd EOF

cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-4 labels: app: alpine spec: replicas: 2 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-4 spec: initContainers:

name: aws-ip-mgmt image: xxxx.ecr.aws/xxxxxx/aws-ip-manager-pb:latest imagePullPolicy: IfNotPresent args: [/bin/sh, -c, '/app/script.sh initContainers']
containers:
name: alpine command: ["sh", "-c", "trap : TERM INT; sleep infinity & wait"] image: public.ecr.aws/amazonlinux/amazonlinux:latest securityContext: privileged: true capabilities: add: ["NET_ADMIN", "SYS_TIME"] nodeSelector: disktype: hdd EOF

Multus CRD

cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-1 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth1", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.4.70-10.0.4.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.4.1" } }' EOF

cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-2 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth2", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.6.70-10.0.6.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.6.1" } }' EOF

cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-3 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth1", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.5.70-10.0.5.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.5.1" } }' EOF

cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-4 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth2", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.7.70-10.0.7.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.7.1" } }' EOF

init container initial reaction : 395588e3-a530-43cb-97e5-b80986698f8b

init container on a pod on another AZ : e4d60afb-ad72-44b7-823b-efc630bb9db4

raghs-aws commented 2 years ago

@Lautreck I will check and try to reproduce. from this log, it seems the VPC CNI is not able to reach the ec2 endpoint. Could you please check

if you have vpc endpoints in that particular AZ and dns is enabled for that endpoint.
if you dont have the vpc endpoints, then could you check, if from the worker node in that AZ you are able to dig on the above shown endpoint URL?
Multus attachment for that pod (10.0.5.0/24) belongs to same AZ.

raghs-aws commented 2 years ago

@Lautreck any feedback? let me know if you still have the issue?

Lautreck commented 2 years ago

@raghs-aws hello, the issue is still here, i didn't have the time to do what you proposed yet.

just for point 3. 10.0.0.0 ; 10.0.2.0 ; 10.0.4.0 ; 10.0.6.0 AZ eu-west-1a 10.0.1.0 ; 10.0.3.0 ; 10.0.5.0 ; 10.0.7.0 AZ eu-west-1b

so my multus attachement on the 10.0.5.0 is on az b It's 2 different pods.

Lautreck commented 2 years ago

hello @raghs-aws

i don't have any AZ with a vpc endpoint. The template given by https://aws.amazon.com/blogs/containers/amazon-eks-now-supports-multus-cni/ doesn't create any endpoint or dns.

2.1. here the result from the dig on a pod that is on eu-west-1a

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> ec2.eu-west-1.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46227 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;ec2.eu-west-1.amazonaws.com. IN A

;; ANSWER SECTION: ec2.eu-west-1.amazonaws.com. 11 IN CNAME eu-west-1.ec2.amazonaws.com. eu-west-1.ec2.amazonaws.com. 11 IN A 67.220.226.37

;; Query time: 1 msec ;; SERVER: 172.20.0.10#53(172.20.0.10) ;; WHEN: Fri Nov 04 12:46:27 UTC 2022 ;; MSG SIZE rcvd: 167

2.2. here the result from the dig on a Node that is on eu-west-1b ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> ec2.eu-west-1.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26574 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;ec2.eu-west-1.amazonaws.com. IN A

;; ANSWER SECTION: ec2.eu-west-1.amazonaws.com. 0 IN CNAME eu-west-1.ec2.amazonaws.com. eu-west-1.ec2.amazonaws.com. 0 IN A 52.95.121.23

;; Query time: 0 msec ;; SERVER: 10.0.0.2#53(10.0.0.2) ;; WHEN: Fri Nov 04 13:21:33 UTC 2022 ;; MSG SIZE rcvd: 100

raghs-aws commented 2 years ago

Thanks, i can see DNS is resolving for both AZs. So basically when you create 1 nodegroup at a time in 1a or 1b, it's working fine (init container comes up fine), however if you create 2 nodegroups together (1a and 1b), the init container work only in 1 of the AZs (1a ) and not (1b). I will try to reproduce it.

another question, you ran the dig, from the EC2 worker nodes where this pod failed (worker node in 1b)?

Lautreck commented 2 years ago

another question, you ran the dig, from the EC2 worker nodes where this pod failed (worker node in 1b)?

yes exactly.

juste a reminder that doing my first nodegroupe on az2 for example work perfectly. it is not a issue that concern az(1b) specificaly. it's just that starting my workflow from az1a and then az1b is the usual.

raghs-aws commented 2 years ago

@Lautreck I could reproduce your problem and noticed this issue, however its not a problem of code or steps. below are the 2 issues I had and you can check if you have same issue or the resolution help you. I think tthe issues is since we are adding the 2nd nodegroup in different AZ coredns

Initially I used the same deploymentset name for both AZ deployments , so saw this issue. however I dont think you hd this problem, as your examples have all unique names for the deploymentsets. however check it once.
I noticed that coredns was running on only these az-1 workers and my az2 multus pods got same issue as you noticed
coredns-56b458df85-ctc69 1/1 Running 0 4m6s 10.10.12.47 ip-10-10-12-40.us-east-2.compute.internal coredns-56b458df85-nngv2 1/1 Running 0 10m 10.10.12.123 ip-10-10-12-76.us-east-2.compute.internal

Then i restarted the coredns (i think you can scale as well), which caused 1 of the coredns create the new pod on the 2nd az worker node coredns-56b458df85-cqkjg 1/1 Running 0 98s 10.0.1.96 ip-10-0-1-132.us-east-2.compute.internal coredns-56b458df85-nngv2 1/1 Running 0 16m 10.10.12.123 ip-10-10-12-76.us-east-2.compute.internal

after this I restarted the multus pods in az-2 and they came up just fine. I believe you are also facing the same issues. basically at the launch coredns is running in just single az (az1) and when we added the az2, coredns is not aware of it and its not resolving the DNS queries . Once I restarted, it starts working fine.

Please check and let me know if this helps.

raghs-aws commented 2 years ago

any feedback @Lautreck ?

Lautreck commented 2 years ago

Hello @raghs-aws I tried your method and it indeed worked. Thank you very much!!!

It's not exactly optimal in a automatised setting to have to manualy restart the coredns. Will their be a fix of this issue? or is their a workaround so we don't have to restart the coredns?

raghs-aws commented 2 years ago

@Lautreck , you could scale as well the coredns , so it could have its presence on the newly launched another AZ workloads.Not sure if we would have created both Nodegroups together then we might not have seen it. But let me check and follow up on this. Anyway if you are ok, then I would close this issue on this thread/solution as issue was with coredns resolution.

Lautreck commented 2 years ago

Hello, thanks for the advices. you can close the ticket, just where can i follow the results of your follow up?

raghs-aws commented 2 years ago

Thanks @Lautreck , appreciate you working with us. I will close the ticket. I would recommend you open a ticket for your case. I would open one as well. I would see if I can share the details here afterwards.

aws-samples / eks-automated-ipmgmt-multus-pods

Init container not working when using node groupes on 2 different AZ #2