aws / aws-network-policy-agent

Apache License 2.0
45 stars 29 forks source link

Handle race with multiple replicas and conntrack VIP entries #179

Closed jayanthvn closed 9 months ago

jayanthvn commented 9 months ago

Issue #, if available: N/A

Description of changes: This handles 2 issues -

  1. Race condition when multiple replicas share the same pinpath. Pinpath was getting deleted during scale down. This PR ensures last replica deletes the pin path.

When 2 replicas are present and one replica is deleted, only we try to remove the ENI filter -

{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:146","msg":"PodIdentifier pinpath ","shared: ":true}
{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:193","msg":"Total number of PolicyEndpoint resources for","podIdentifier ":"websocket-client-7f65cc6544-default"," are ":0}{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:193","msg":"DetacheBPFProbes for","pod":"websocket-client-7f65cc6544-9j58r"," in namespace":"default"," with hostVethName":"eni91226d5f053"," cleanup pinPath":false}{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:318","msg":"Attempting to do an Ingress Detach"}

{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:193","msg":"Successfully detached Ingress TC probe for","pod: ":"websocket-client-7f65cc6544-9j58r"," in namespace":"default"}
{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:318","msg":"Attempting to do an Egress Detach"}
{"level":"info","ts":"2024-01-05T02:18:01.286Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:193","msg":"Successfully detached Egress TC probe for","pod: ":"websocket-client-7f65cc6544-9j58r"," in namespace":"default"}

When the last replica is deleted is when pinpath is deleted -

"level":"info","ts":"2024-01-04T22:32:37.982Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:146","msg":"Delete ","pinPath : ":true}

{"level":"info","ts":"2024-01-04T22:32:37.982Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:318","msg":"Deleting: ","Program: ":"/sys/fs/bpf/globals/aws/programs/websocket-client-7f65cc6544-default_handle_ingress","Map: ":"/sys/fs/bpf/globals/aws/maps/websocket-client-7f65cc6544-default_ingress_map"}
{"level":"info","ts":"2024-01-04T22:32:37.982Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:318","msg":"Found the Program and Map to delete - ","Program: ":"/sys/fs/bpf/globals/aws/programs/websocket-client-7f65cc6544-default_handle_ingress","Map: ":"/sys/fs/bpf/globals/aws/maps/websocket-client-7f65cc6544-default_ingress_map"}
  1. DIP can be the service IP and if src and dest pods are on the same node and NP is enforced for both the pods. Then kernel contract will have just VIP in the forward direction and pod IP in the reverse direction. Where as NA conntrack will have 2 separate entries. In conntrack cleanup we just use the forward direction IP from the kernel conntrack leading to expiry of NA conntrack entry which has the pod IP.

Eg:

root@ip-192-168-29-159 bin]# ./aws-eks-na-cli ebpf dump-maps 53
Conntrack Key : Source IP - 192.168.17.118 Source port - 57292 Dest IP - 10.100.58.78 Dest port - 80 Protocol - 6 Owner IP - 192.168.17.118
Value : 
Conntrack Val -  1
*******************************
Conntrack Key : Source IP - 192.168.14.74 Source port - 44240 Dest IP - 192.168.7.235 Dest port - 8080 Protocol - 6 Owner IP - 192.168.7.235
Value : 
Conntrack Val -  1
*******************************
Conntrack Key : Source IP - 192.168.17.118 Source port - 57292 Dest IP - 192.168.7.235 Dest port - 8080 Protocol - 6 Owner IP - 192.168.7.235
Value : 
Conntrack Val -  1
*******************************
Conntrack Key : Source IP - 192.168.14.74 Source port - 44240 Dest IP - 10.100.58.78 Dest port - 80 Protocol - 6 Owner IP - 192.168.14.74
Value : 
Conntrack Val -  1
*******************************
Done reading all entries

[root@ip-192-168-29-159 bin]# conntrack -L | grep 192.168.7.235
tcp      6 431999 ESTABLISHED src=192.168.14.74 dst=10.100.58.78 sport=44240 dport=80 src=192.168.7.235 dst=192.168.14.74 sport=8080 dport=44240 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 78 flow entries have been shown.
tcp      6 431999 ESTABLISHED src=192.168.17.118 dst=10.100.58.78 sport=57292 dport=80 src=192.168.7.235 dst=192.168.17.118 sport=8080 dport=57292 [ASSURED] mark=0 use=1

Those entries with Pod IP as dest IP was getting deleted even for established connections -

"level":"info","ts":"2024-01-04T21:37:13.479Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Check for any stale entries in the conntrack map"}
{"level":"info","ts":"2024-01-04T21:37:13.485Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Entry - ":"Expired/Delete Conntrack Key : Source IP - 192.168.9.135 Source port - 38610 Dest IP - 192.168.7.235 Dest port - 8080 Protocol - 6 Owner IP - 192.168.7.235"}
{"level":"info","ts":"2024-01-04T21:37:13.485Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Entry - ":"Expired/Delete Conntrack Key : Source IP - 192.168.17.128 Source port - 46126 Dest IP - 192.168.7.235 Dest port - 8080 Protocol - 6 Owner IP - 192.168.7.235"}
{"level":"info","ts":"2024-01-04T21:37:13.485Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Delete - ":{"Source_ip":2265557184,"Source_port":38610,"Dest_ip":3943147712,"Dest_port":8080,"Protocol":6,"Owner_ip":3943147712}}
{"level":"info","ts":"2024-01-04T21:37:13.485Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Delete - ":{"Source_ip":2148640960,"Source_port":46126,"Dest_ip":3943147712,"Dest_port":8080,"Protocol":6,"Owner_ip":3943147712}}
{"level":"info","ts":"2024-01-04T21:37:13.485Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Done cleanup of conntrack map"}

Post fix, no stale entries found-

{"level":"info","ts":"2024-01-05T03:33:44.075Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Check for any stale entries in the conntrack map"}
{"level":"info","ts":"2024-01-05T03:33:44.081Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Done cleanup of conntrack map"}
{"level":"info","ts":"2024-01-05T03:38:44.081Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Check for any stale entries in the conntrack map"}{"level":"info","ts":"2024-01-05T03:38:44.088Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Done cleanup of conntrack map"}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jayanthvn commented 9 months ago

Pending regression test.