aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Apache License 2.0
2.27k stars 737 forks source link

Some pods unable to reach service backed by public IP of ec2 instance #1910

Closed prateekgogia closed 1 year ago

prateekgogia commented 2 years ago

What happened: When running a service on EKS which is backed by public IP address of the instance its reachable by some of the pods running on the same node. Pods which are assigned ip from primary ENI are able to reach this service, however, pods with IPs assigned from an attached ENI are not able to reach.

my-service is backed by nginx pod running in host network namespace, and 3.4.5.6(changed) is the public IP of the instance

➜  k get endpoints my-service
NAME         ENDPOINTS            AGE
my-service   3.4.5.6:8443   82m

➜  k get service my-service
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
my-service   ClusterIP   10.100.205.85   <none>        8443/TCP   83m

➜ k get pods nginx -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          36m   192.168.1.80   ip-192-168-1-80.us-west-2.compute.internal   <none>           <none>

Created 3 pods, of which 2 pods (jv4cm and x82fw) are able to curl the service IP and one is not able to curl, it timesout.

curl -I http://10.100.205.85:8443
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.15
Date: Sat, 05 Mar 2022 20:30:49 GMT
Content-type: text/html; charset=UTF-8
Content-Length: 710
kubectl get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP               NODE                                         NOMINATED NODE   READINESS GATES
dummy-deployment-6f8877777-jv4cm   1/1     Running   0          23m   192.168.28.203   ip-192-168-1-80.us-west-2.compute.internal   <none>           <none>
dummy-deployment-6f8877777-k7jwn   1/1     Running   0          22m   192.168.27.187   ip-192-168-1-80.us-west-2.compute.internal   <none>           <none>
dummy-deployment-6f8877777-x82fw   1/1     Running   0          34m   192.168.7.122    ip-192-168-1-80.us-west-2.compute.internal   <none>           <none>

Two ENIs are attached to the node with the following IPs-

Primary ENI with public IP attached has following IPs
192.168.7.122
192.168.28.203
192.168.30.91

Secondary ENI has IPs -
192.168.27.187
192.168.5.47
192.168.1.118

The pod having issues is assigned IP 192.168.27.187

Attach logs

iptables-save logs- https://gist.github.com/prateekgogia/40778f56c890518fcb2d727d8495c907

What you expected to happen: All the pods on the node should be able to reach this service

How to reproduce it (as minimally and precisely as possible):

Now check the pod IPs, pod with IPs assigned from secondary private IP range from primary ENI are able to curl on this port but the pods with IPs assigned from secondary private IP range from another ENI attached to this node are not able to reach this service.

Anything else we need to know?:

Environment:

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

jayanthvn commented 2 years ago

/not stale

elagree commented 2 years ago

Hello @prateekgogia !

Thanks for reporting this issue.

We've got the same issue on our side with a different use-case.

We are using Kubernetes Service with a custom endpoint to reach some EC2 (mainly for DNS reasons).

With last 1.7 (1.7.10) aws-node version, our application pods are able to reach the EC2 through the Kubernetes Service without any issue.

Since we start working on the upgrade of the CNI the behavior change : At the beginning it appears to be random. Old pods created when CNI 1.7 was running were still able to reach the K8s service back-end but new pods created when CNI <1.8+ was running were not if they were on the same node.

After doing more test in appears like pods attached to the primary ENI are able to reach our EC2 whereas pods attached to secondary ENI are not if CNI is <1.8+ is running.

I also test to rollback the CNI to 1.7.10 while pods still running and the "broken" pods are allow to reach the EC2 through the k8s Service.

So the issue seems not related to the IP itself but more to the routing I guess.

I hope this issue will be fix soon as it's impacting us too.

Thanks !

DavidHe1127 commented 2 years ago

Our issue looks very similar to yours @elagree. We have a service without specifying selector to pick pods as backends but rather hooking it up to a custom endpoint. The endpoint points itself to a public IP of an EC2 where there is a simple node server running.

Curl from a pod inside the cluster showed the conn is timed out. However, curling to the public IP without going through service works though.

# curl: (28) Failed to connect to external-svc.io.svc.cluster.local port 80 after 129318 ms: Operation timed out
curl external-svc.io.svc.cluster.local:80

# worked
curl 3.21.112.247

Not sure if it's because some flags are not set correctly so that trying to connect to a public up from endpoint doesn't work?

My environment

DavidHe1127 commented 2 years ago

It turned out something to do with our particular ns where the networking isn't configured to allow the traffic to hit external destination. Sorry for the false alarm.

julianogv commented 2 years ago

It turned out something to do with our particular ns where the networking isn't configured to allow the traffic to hit external destination. Sorry for the false alarm.

Hi @DavidHe1127, I'm having the same issue here, pods on public subnets are not being able to reach external (internet) addresses, only addresses inside the VPC. What do you mean by "particular ns"? Maybe your solution might help me as well.

johngmyers commented 1 year ago

We have what appears to be the same issue, except the pods are on private subnets and the service is backed by IPs in a different VPC, on the other end of a Transit Gateway.

I note that the VPC flow logs show the traffic from pods on the primary ENI has the source IP rewritten to the node's IP. Presumably this is from the iptables rules written by kube-proxy. My suspicion is that packets coming from pods on secondary ENIs are incorrectly trying to egress from the secondary ENI after being rewritten and are thus being blocked because the secondary ENI doesn't have the node's IP assigned to it.

ps-jay commented 1 year ago

My suspicion is that packets coming from pods on secondary ENIs are incorrectly trying to egress from the secondary ENI after being rewritten and are thus being blocked because the secondary ENI doesn't have the node's IP assigned to it.

I agree with this analysis

I've run tcpdump on a worker node, and initiated traffic from a Pod that has an IP on the secondary ENI

When traffic is sent to the service IP, the traffic is sent from eth1 (incorrectly)

When traffic is sent to the IP of the endpoint (not translated via the K8s service), the traffic is sent from eth0 (correctly)

dh-harald commented 1 year ago

I've something similar issue... I'm using aws-cni 1.10.1 (we upgraded it from 1.7.10, it was working on that version) I've a service with external endpoint:

apiVersion: v1
kind: Endpoints
metadata:
  name: my-service
  namespace: my-namespace
subsets:
- addresses:
  - ip: 10.x.x.x # external IP (outside VPC, we're accessing it via site-to-site vpn)
  ports:
  - name: https
    port: 443
    protocol: TCP

My node has 4 ip address (eth0-eth3) I'm using tcpdump inside the aws-node pod to determine the problem: My testing pod is running on eth2 interface if I try to access the external IP directly, it works perfectly, tcpdump shows the following source address was translated to address of eth0, traffic leaves eth0 (I can see the traffic on eth0) if I try to access the service (my-service.my-namespace.svc.cluster.local:443 i've got timeout because: the source address was translated to address of eth0 but the traffic leaves eth2 (I can see the traffic on eth2)

jdn5126 commented 1 year ago

@dh-harald yeah, this is the same issue. We are still trying to come up with a traffic steering solution that has an acceptable performance overhead.

elagree commented 1 year ago

Hello, I saw that 1.12 is available. Do you know if this issue is finally fixed ?

Thanks :)

jdn5126 commented 1 year ago

@elagree a solution for this issue, through the use of an environment variable where external service CIDRs are specified, is present in #2243 . That is currently being reviewed, and will ship in a future release

jdn5126 commented 1 year ago

Closing as AWS_EXTERNAL_SERVICE_CIDRS will ship in v1.12.6 next week

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.