Open benesch opened 1 year ago
I am also interested lately into the correctness of egress gateway (and thus this but probably with less needs than you as my personal cluster where I use egressgateway is super small so this is probably nearly instantaneous in my case anyway). From what I can see (non isovalent/non expert opinion) on the subject this is most likely way harder to implement than the proposed solution you are describing because the BPF maps need to be updated on all the involved nodes (all the selected egress gateways + the client node) for the egress gateway to work correctly.
So it might needs that each node involved with a specified egress gateway report somehow whenever they update the egress maps and that on the CNI call it waits. Not exactly sure what would be the best way to report this though :thinking: (cilium endpoint status maybe?)...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Not stale.
On Sat, Jun 10, 2023 at 10:10 PM github-actions[bot] < @.***> wrote:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
— Reply to this email directly, view it on GitHub https://github.com/cilium/cilium/issues/24791#issuecomment-1585974263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXSIEOQJQIWOKORB7UIY3XKUSIRANCNFSM6AAAAAAWYGIF6A . You are receiving this because you authored the thread.Message ID: @.***>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Not stale.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Still relevant.
Cilium Feature Proposal
Is your feature request related to a problem?
Pods that are subject to an
CiliumEgressGatewayPolicy
can make a few requests immediately after pod startup that are not subject to the egress policy.This problem was previously reported in https://github.com/cilium/cilium/issues/22969 as excessive latency between pod creation and policy application. That issue was solved by optimizing the policy application time; however a small window (1-2s) between pod creation and policy application still exists.
Describe the feature you'd like
We'd like Cilium to block pod startup until all egress gateway policies for all policies that were present at the time of the pod's creation have been applied to the pod.
I've previously filed this request via a support ticket (https://support.isovalent.com/hc/en-us/requests/332; internal enhancement request 520), but I wanted to re-request in public so that I could link this issue at folks.
(Optional) Describe your proposed solution
This is analogous to synchronously waiting for the
CiliumEndpoint
to be created when a pod is created, which Cilium added support for back in 2018:Ideally a similar code path could be added for egress gateway policies—i.e., waiting for the first regeneration of egress policies after the new endpoint is created.
Workaround
For anyone else running into this, we've been successfully using the following workaround at Materialize. We plumb the list of known egress gateway IPs into an init container, and then block until
curl https://checkip.amazonaws.com
reports that the traffic is going out one of those egress IPs:Silly, but it works.