emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.36k stars 683 forks source link

Ambassador memory grows indefinitely, causes config throttling #3684

Open etotten opened 3 years ago

etotten commented 3 years ago

Describe the bug We have multiple environments running Ambassador with the ConsulResolver sending to a Consul Connect mesh. In each of these environments, we see that Ambassador's memory appears to grow indefinitely until it gets to about 98-99%, where it hovers. While this might be acceptable alone, it causes a behavior in Ambassador where it starts throttling config changes, showing error messages like this:

time="2021-07-30T08:26:04Z" level=warning msg="Memory Usage: throttling reconfig v14739 due to constrained memory with 30 stale reconfigs (30 max)"

This throttling coincides with clients getting lots of 503 responses from Ambassador when they try to access services due to unhealthy upstreams.

We understand that the throttling is working as intended, but the issue here is that the memory leak combined with that throttling behavior causes much instability with requests.

Here's an example of mem usage and mem % in one of these environments: aes_mem_climbing_tsg-cert aes_mempct_climbing_tsg-cert

To Reproduce

Versions (please complete the following information):

Additional Context Upon inspection, we can see that under these circumstances Ambassador does not keep up with changes to upstream pods. Specifically, if I hop onto an ambassador pod and run:

curl -s http://127.0.0.1:8001/clusters | grep

...we can see the IPs for pods in the upstream, but when we terminate a pod under throttling conditions, we can see that the IP of the terminated pod lingers in the cluster, increasing latency (when using retries) and 503's soon after.

We noticed that there is a new feature in 1.13.10, AMBASSADOR_AMBEX_NO_RATELIMIT, which can suppress the throttling behavior; however we are having a different, eventual cluster-upstreams issue using that...see: https://github.com/emissary-ingress/emissary/issues/3680

We have recently given the pods more memory to create a longer period of time before the config throttling begins; however, the memory keeps growing and it will just be matter of time before the config throttling happens again.

etotten commented 3 years ago

In order to get some sense of which processes are growing, I did take these two snapshots in a test environment about 28 hours apart. It looks as if the "busyambassador entrypoint" process may be the one that is growing.

which-proc-grows-1 which-proc-grows-2
etotten commented 3 years ago

There is at least one other who sees this problem as well - sounds like it is still something they're working-around: https://github.com/emissary-ingress/emissary/issues/3414#issuecomment-896437505

juanjoku commented 3 years ago

Hi, we also observe an increase in memory usage with Ambassador v1.13.10 (did not occur with v1.11.1).

We use Kubernetes Endpoint Resolver, not Consul. The increase occurs on an Ambassador that is receiving minimal traffic, and is not being reconfigured (the mappings do not change).

For example, last two days:

image

ppeble commented 2 years ago

Just to bump, I believe this might be happening to us as well. We just noticed the Memory Usage: throttling reconfig error messages with high memory usage. We are running 1.14.1.

We are running an istio mesh, not using consul.

dwgillies-bluescape commented 2 years ago

We are seeing the same issue @Bluescape, in our alphatest cluster (50+ users, 1.13.3, running istio). We have 6 ambassador pods, agent, and redis. Agent grows without bounds and crashes, often.

Screen Shot 2022-01-27 at 11 07 45 AM
kareem-elsayed commented 1 year ago

We are also facing this issue with Emissary v2.3.1 and noticed that memory consumption increased almost double more than the old version of Ambassador v1.7.4 We tried to follow recommendations on documentation but we didn't find a more significant effect on resources consumption