envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.89k stars 4.79k forks source link

EDS requests sent after receiving SIGTERM #13900

Open howardjohn opened 3 years ago

howardjohn commented 3 years ago

If you are reporting any crash or any potential security issue, do not open an issue in this repo. Please report the issue via emailing envoy-security@googlegroups.com where the issue will be triaged appropriately.

Title: EDS requests sent after receiving SIGTERM

Description: When envoy is sent SIGTERM, it will start tearing down clusters and then send new EDS requests dropping those resources.

I am not sure the impact of this, but it seems a bit suspicious at the very least

Repro steps: Send SIGTERM to envoy, turn on trace:config logs and see the discovery requests. Note in my case it was pretty rare to actually see these show up in the XDS server, likely due to racing to shutdown the process, but it does happen. Seems like usually only a couple requests get through for some reason.

Logs: logs.txt

SIGTERM sent at 07:59:25, you can see the EDS requests at that point

antoniovicente commented 3 years ago

cc @yanavlasov @htuch @lambdai @junr03

It doesn't seem ideal that Envoy is sending spurious eds requests during shutdown.

htuch commented 3 years ago

I think this occurs as EDS is being destructed and is unsubscribing from the resources it was previously subscribed to.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

howardjohn commented 3 years ago

@htuch I agree that seems like what is happening - is this expected though? I cannot actually think of anything that would break by this, but I would think the optimal behavior is it doesn't send these, perhaps its a very low priority though.

htuch commented 3 years ago

Ideally we have some strong notion of the last place on the shutdown path its safe to be doing this and ensure it happens before then. On graceful shutdowns we should cleanly teardown and release our subscriptions. I think this probably needs more investigation and definitely more comments/documentation.