Closed tamilmani1989 closed 1 year ago
cc: @christarazi (author of PR https://github.com/cilium/cilium/pull/23208).
May be not related to this PR, sorry. Since this PR adds log message its exposed.. should we consider skip logging this message if cilium agent running in native mode? I can open up a PR if that's the case
Does it happen consistently or was it just a one-off?
Is it possible for you to take a sysdump after the issue occurs and then upload it here?
@christarazi Yes its consistent. Attached the sysdump
@christarazi did you get a chance to check on this issue?
So after a brief investigation, I believe Cilium should still be operating correctly. When it discovers the conflict, it will keep the strongest "source" of the information in the datapath. However, you're asking about why this is happening.
The reason this is happening is because the local router IP is the same on both nodes. So when Cilium is notified of both nodes (via CiliumNode updates), it plumbs the IP addresses from each event into the ipcache. Since both nodes have the same router IP (169.254.23.0
), Cilium discovers a conflict between two different events (2 nodes, 2 events) when it maps 169.254.23.0
to the node IPs, hence the log msg.
I assume based on the configuration passed to Cilium that configuring the local router IP is intended and intended to be the same on all nodes?
Thanks for getting back on this issue.
I assume based on the configuration passed to Cilium that configuring the local router IP is intended and intended to be the same on all nodes?
Yes, that's correct.
After thinking about it a bit more, I think it's safe to do this: https://github.com/cilium/cilium/pull/27331. Please let me know if this resolves your issue.
@christarazi Thanks for taking care of this. I will let you know once I validate with your PR
@christarazi validated with #27331 PR and didn't see warning message anymore
A follow up question - any adverse impact using same local router ip 169.254.23.0
on different nodes in native routing mode?
@tamilmani1989 No not that I'm aware of.
Is there an existing issue for this?
What happened?
In AKS clusters, when I deployed latest cilium in native routing mode with delegated ipam, this warning message is logged by cilium agent:
We found this started to happen after this commit : https://github.com/cilium/cilium/pull/23208 .
Cilium Version
Latest master
Kernel Version
5.15+
Kubernetes Version
1.25
Cilium Config
Sysdump
Relevant log output