aws / aws-network-policy-agent

Apache License 2.0
45 stars 29 forks source link

Possible race condition in conntrack map cleanup #246

Closed micahnoland closed 5 months ago

micahnoland commented 7 months ago

What happened: We are observing an intermittent issue with network-policy-agent denying return packets on an active flow, seemingly due to the flow being removed from the conntrack map. These connections originate from long-running pods and do not seem related to strict mode.

{"level":"info","ts":"2024-04-10T20:38:06.940Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Check for any stale entries in the conntrack map"}
{"level":"info","ts":"2024-04-10T20:38:07.026Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"10.3.99.194","Src Port":33794,"Dest IP":"209.54.181.208","Dest Port":443,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-10T20:38:07.095Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Entry - ":"Expired/Delete Conntrack Key : Source IP - 10.3.99.194 Source port - 33794 Dest IP - 209.54.181.208 Dest port - 443 Protocol - 6 Owner IP - 10.3.99.194"}
{"level":"info","ts":"2024-04-10T20:38:07.126Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Delete - ":"Conntrack Key : Source IP - 10.3.99.194 Source port - 33794 Dest IP - 209.54.181.208 Dest port - 443 Protocol - 6 Owner IP - 10.3.99.194"}
{"level":"info","ts":"2024-04-10T20:38:07.180Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:07.180Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:07.180Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:07.181Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:07.208Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:07.214Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Done cleanup of conntrack map"}
{"level":"info","ts":"2024-04-10T20:38:07.428Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:07.858Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:08.768Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:10.528Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:13.978Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:20.768Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:38:34.848Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-10T20:39:07.032Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"10.3.99.194","Src Port":33794,"Dest IP":"209.54.181.208","Dest Port":443,"Proto":"TCP","Verdict":"ACCEPT"}

Perhaps we are misunderstanding the code, but it appears as though there is a race condition here, where:

  1. CleanupConntrackMap() starts
  2. The local cache is hydrated
  3. A new connection is established: {"level":"info","ts":"2024-04-10T20:38:07.026Z","logger":"ebpf-client","msg":"Flow Info: ","Src IP":"10.3.99.194","Src Port":33794,"Dest IP":"209.54.181.208","Dest Port":443,"Proto":"TCP","Verdict":"ACCEPT"}
  4. The ebpf map is iterated upon, and each key is checked to see if it exists in local cache
  5. Since this new connection was not established when the local cache was hydrated, it gets deleted: {"level":"info","ts":"2024-04-10T20:38:07.126Z","logger":"ebpf-client","caller":"wait/backoff.go:227","msg":"Conntrack cleanup","Delete - ":"Conntrack Key : Source IP - 10.3.99.194 Source port - 33794 Dest IP - 209.54.181.208 Dest port - 443 Protocol - 6 Owner IP - 10.3.99.194"}
  6. Now all return flows on this connection are denied: {"level":"info","ts":"2024-04-10T20:38:34.848Z","logger":"ebpf-client","msg":"Flow Info: ","Src IP":"209.54.181.208","Src Port":443,"Dest IP":"10.3.99.194","Dest Port":33794,"Proto":"TCP","Verdict":"DENY"}
  7. Client retries exactly 60s after the first request and the connection is re-established: {"level":"info","ts":"2024-04-10T20:39:07.032Z","logger":"ebpf-client","msg":"Flow Info: ","Src IP":"10.3.99.194","Src Port":33794,"Dest IP":"209.54.181.208","Dest Port":443,"Proto":"TCP","Verdict":"ACCEPT"}

Running conntrack -E -o timestamp during this time, we only see 3 lines for port 33794:

[1712781487.026603]     [NEW] tcp      6 120 SYN_SENT src=10.3.99.194 dst=209.54.181.208 sport=33794 dport=443 [UNREPLIED] src=209.54.181.208 dst=10.3.96.218 sport=443 dport=27340 mark=128
[1712781487.027636]  [UPDATE] tcp      6 60 SYN_RECV src=10.3.99.194 dst=209.54.181.208 sport=33794 dport=443 src=209.54.181.208 dst=10.3.96.218 sport=443 dport=27340 mark=128
[1712781487.027672]  [UPDATE] tcp      6 432000 ESTABLISHED src=10.3.99.194 dst=209.54.181.208 sport=33794 dport=443 src=209.54.181.208 dst=10.3.96.218 sport=443 dport=27340 [ASSURED] mark=128

I have the output of sudo bash /opt/cni/bin/aws-cni-support.sh as well and I'll send that along to k8s-awscni-triage@amazon.com.

How to reproduce it (as minimally and precisely as possible): The issue is intermittent, since the outbound connection needs to be established during the short time that cleanup is running. We have observed this more easily in pods which establish many outbound connections.

Environment:

jayanthvn commented 7 months ago

@micahnoland - Nice debugging! We were just discussing this scenario yesterday evening while working on https://github.com/aws/aws-network-policy-agent/issues/245

To summarize -

  1. Source pod starts a connection with Dest pod. Source pod veth has egress probe. Since the packet goes via egress proble, there will be a [SIP, SPORT, DIP, DPORT] entry is made in local conntrack table but not in kernel conntrack table yet.
  2. Cleanup routine starts and caches all entries from kernel conntrack table. This is to check if entries in local conntrack has to be deleted..
  3. Kernel conntrack table entry is made - [SIP, SPORT, DIP, DPORT]
  4. Since the cache (snapshot from kernel conntrack) in Cleanup routine doesn't have the entry ( [SIP, SPORT, DIP, DPORT]), we will end up cleaning the local conntrack entry made in step 1...

One simple way would be to allow the entry in local conntrack table one additional reconciler chance before cleanup but again if the timer is set to a very low value then we will end up in a similar situation..will think about this and get back...

achevuru commented 7 months ago

Right now, we don't check the connection state and that potentially is resulting in this as well and this appears to be the RCA for #245 as well. We will check the logs and will address it soon.

jayanthvn commented 5 months ago

Fix is released with network policy agent v1.1.2. - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2. Please test and let us know if there are any issues.