envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.75k stars 4.75k forks source link

Potential memory leak after requests ? #31673

Closed quantumsheep closed 8 months ago

quantumsheep commented 8 months ago

Description

We noticed that Envoy accumulated memory when receiving requests, never releasing it. It's especially noticeable when performing a lot of requests at once. Is it normal?

The case was found in 1.25.11 and 1.28.0. I haven't tried other versions.

The following screenshot demonstrate the effects of a spam on Envoy and shows that the memory usage never drops after the request has ended. It was done using HTTP/2.

I tried outputting a flame-graph but wasn't successful exploring the heap dump.

Memory usage

Minimal configuration

```yaml static_resources: listeners: - name: hello_envoy address: socket_address: address: 0.0.0.0 port_value: 80 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager http_filters: - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router route_config: name: hello_envoy virtual_hosts: - name: hello_envoy_main domains: ["*"] routes: - match: prefix: / direct_response: body: inline_string: '{"message":"hello"}' status: 200 stat_prefix: hello_envoy ```

phlax commented 8 months ago

@quantumsheep i wonder if this is related to using direct_response

stress testing with direct_response locally and the memory goes up for a while, and then seem to cap out and fluctuate

phlax commented 8 months ago

hmm, actually just stress testing your example config (ie not my config with other factors) and its working ~as i would expect - a slight memory increase when the resource is first requested (direct_response stays in mem) at which point memory stays totally stable regardless of how many requests i make

quantumsheep commented 8 months ago

Hey 👋 Thanks for getting back to me.

On my side it never stops increasing but most importantly, never decreases.

It's even more blatant in a more advanced configuration (with rate-limiter, routing, headers mapping, etc...) CleanShot 2024-01-08 at 19 40 56@2x

phlax commented 8 months ago

i wasnt able to reproduce with your minimal config - can you confirm that triggers the issue when run by itself?

how are you testing it?

quantumsheep commented 8 months ago

I made a demo repository so you can test it easily.

Run stress.sh in your fastest terminal, it's limited by the terminal's IO.

phlax commented 8 months ago

@quantumsheep its the same config that i have already tested and it didnt reproduce the problem

im wondering what else could be causing what you are seeing

usovamaria commented 8 months ago

Hi. Our Envoy cluster is also affected by the same problem and the heap usage is 6gb->90gb in 10 days. Envoy version is 1.24.12. The most interesting detail is that we see these problems on prestable cluster which is deployed to LXC containters. Stable cluster is bare metal and the situation isn't that bad (14->25gb in a month). Tuning concurrency according to the number of available cores on lxc helped a bit, but the problem wasn't solved.

quantumsheep commented 8 months ago

@phlax now I can't reproduce the issue :/

There surely is a problem somewhere as our graph shows but it may not be as simple as the configuration I gave you.

phlax commented 8 months ago

yep - i think the issue is not present in the minimal config - if you can pinpoint what is triggering we can investigate further

phlax commented 8 months ago

@usovamaria 1.24 is no longer supported im afraid - and i think as currently presented this is a non-issue - likewise if you can pinpoint a specific problem please open a ticket for it.

im gonna close this for now to prevent any confustion