Closed Lautreck closed 2 years ago
@Lautreck I will check and try to reproduce. from this log, it seems the VPC CNI is not able to reach the ec2 endpoint. Could you please check
@Lautreck any feedback? let me know if you still have the issue?
@raghs-aws hello, the issue is still here, i didn't have the time to do what you proposed yet.
just for point 3. 10.0.0.0 ; 10.0.2.0 ; 10.0.4.0 ; 10.0.6.0 AZ eu-west-1a 10.0.1.0 ; 10.0.3.0 ; 10.0.5.0 ; 10.0.7.0 AZ eu-west-1b
so my multus attachement on the 10.0.5.0 is on az b It's 2 different pods.
hello @raghs-aws
2.1. here the result from the dig on a pod that is on eu-west-1a
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> ec2.eu-west-1.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46227 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;ec2.eu-west-1.amazonaws.com. IN A
;; ANSWER SECTION: ec2.eu-west-1.amazonaws.com. 11 IN CNAME eu-west-1.ec2.amazonaws.com. eu-west-1.ec2.amazonaws.com. 11 IN A 67.220.226.37
;; Query time: 1 msec ;; SERVER: 172.20.0.10#53(172.20.0.10) ;; WHEN: Fri Nov 04 12:46:27 UTC 2022 ;; MSG SIZE rcvd: 167
2.2. here the result from the dig on a Node that is on eu-west-1b ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> ec2.eu-west-1.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26574 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;ec2.eu-west-1.amazonaws.com. IN A
;; ANSWER SECTION: ec2.eu-west-1.amazonaws.com. 0 IN CNAME eu-west-1.ec2.amazonaws.com. eu-west-1.ec2.amazonaws.com. 0 IN A 52.95.121.23
;; Query time: 0 msec ;; SERVER: 10.0.0.2#53(10.0.0.2) ;; WHEN: Fri Nov 04 13:21:33 UTC 2022 ;; MSG SIZE rcvd: 100
Thanks, i can see DNS is resolving for both AZs. So basically when you create 1 nodegroup at a time in 1a or 1b, it's working fine (init container comes up fine), however if you create 2 nodegroups together (1a and 1b), the init container work only in 1 of the AZs (1a ) and not (1b). I will try to reproduce it.
another question, you ran the dig, from the EC2 worker nodes where this pod failed (worker node in 1b)?
another question, you ran the dig, from the EC2 worker nodes where this pod failed (worker node in 1b)?
yes exactly.
juste a reminder that doing my first nodegroupe on az2 for example work perfectly. it is not a issue that concern az(1b) specificaly. it's just that starting my workflow from az1a and then az1b is the usual.
@Lautreck I could reproduce your problem and noticed this issue, however its not a problem of code or steps. below are the 2 issues I had and you can check if you have same issue or the resolution help you. I think tthe issues is since we are adding the 2nd nodegroup in different AZ coredns
Initially I used the same deploymentset name for both AZ deployments , so saw this issue. however I dont think you hd this problem, as your examples have all unique names for the deploymentsets. however check it once.
I noticed that coredns was running on only these az-1 workers and my az2 multus pods got same issue as you noticed
coredns-56b458df85-ctc69 1/1 Running 0 4m6s 10.10.12.47 ip-10-10-12-40.us-east-2.compute.internal
Then i restarted the coredns (i think you can scale as well), which caused 1 of the coredns create the new pod on the 2nd az worker node
coredns-56b458df85-cqkjg 1/1 Running 0 98s 10.0.1.96 ip-10-0-1-132.us-east-2.compute.internal
after this I restarted the multus pods in az-2 and they came up just fine. I believe you are also facing the same issues. basically at the launch coredns is running in just single az (az1) and when we added the az2, coredns is not aware of it and its not resolving the DNS queries . Once I restarted, it starts working fine.
Please check and let me know if this helps.
any feedback @Lautreck ?
Hello @raghs-aws I tried your method and it indeed worked. Thank you very much!!!
It's not exactly optimal in a automatised setting to have to manualy restart the coredns. Will their be a fix of this issue? or is their a workaround so we don't have to restart the coredns?
@Lautreck , you could scale as well the coredns , so it could have its presence on the newly launched another AZ workloads.Not sure if we would have created both Nodegroups together then we might not have seen it. But let me check and follow up on this. Anyway if you are ok, then I would close this issue on this thread/solution as issue was with coredns resolution.
Hello, thanks for the advices. you can close the ticket, just where can i follow the results of your follow up?
Thanks @Lautreck , appreciate you working with us. I will close the ticket. I would recommend you open a ticket for your case. I would open one as well. I would see if I can share the details here afterwards.
Hello, I've managed to make pods with Multus work with the Init Container and make many things with them on ONE AZ Now i'm testing pods with Multus on TWO different AZ.
After many tries i discovered that using the template given by the blog post on cloudformation to create nodegroups, only the first one created work with Init Container, the second nodegroup created on another AZ has init Container fail to connect to the endpoint.
in my case i work on eu-west-1, the VPC is on 10.0.0.0/16 as on the blog post https://aws.amazon.com/blogs/containers/amazon-eks-now-supports-multus-cni/
When i create my first nodegroup with the cloudformattion template either on AZ eu-west-1a or AZ eu-west-1b it work. the AZ has no consequences and their is no bug.
after creating a second nodegroup on a different AZ, then init container stop working. it's not my multus CRD at fault because i can make pods on any az as long as their's only one nodegroup.
reproduce the bug : create cluster => create nodegroupe 1 => create pods => create nodegroupe 2 on another AZ => create pods on AZ 2
Pods CRD
i use nodeSelector to be sure than my pods are created on the nods with the right network.
cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-1 labels: app: alpine spec: replicas: 3 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-1, ipvlan-multus-2 spec: initContainers:
containers:
cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-2 labels: app: alpine spec: replicas: 2 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-2 spec: initContainers:
containers:
cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-3 labels: app: alpine spec: replicas: 1 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-3 spec: initContainers:
containers:
cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment-4 labels: app: alpine spec: replicas: 2 strategy: type: RollingUpdate selector: matchLabels: app: alpine template: metadata: labels: app: alpine annotations: k8s.v1.cni.cncf.io/networks: ipvlan-multus-4 spec: initContainers:
containers:
Multus CRD
cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-1 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth1", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.4.70-10.0.4.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.4.1" } }' EOF
cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-2 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth2", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.6.70-10.0.6.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.6.1" } }' EOF
cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-3 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth1", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.5.70-10.0.5.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.5.1" } }' EOF
cat <<EOF | kubectl apply -f - apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ipvlan-multus-4 spec: config: '{ "cniVersion": "0.3.0", "type": "ipvlan", "master": "eth2", "mode": "l2", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "10.0.7.70-10.0.7.80/24", "log_file" : "/tmp/whereabouts.log", "log_level" : "debug", "gateway": "10.0.7.1" } }' EOF
init container initial reaction :
init container on a pod on another AZ :