Open radwaretaltr opened 10 months ago
Thanks for reporting this! Do you happen to have flame graph showing memory usage after it is run for some time while failing to fetch a configuration?
Thanks for reporting this! Do you happen to have flame graph showing memory usage after it is run for some time while failing to fetch a configuration?
Hi, thanks for responding. No, we don't have one.
I let this run overnight and got this profile. I don't have experience reading these yet, but I've attached the overall shape and a zoom on the red parts. The last call leading into the red part is TimerImpl, so I wonder if some timers are created but never closed for each attempt to fetch the config --I'm not familiar with the code yet.
fwiw, I do something like this. You'll need to run pprof to have a good look at the data.
curl 127.0.0.1:19000/heap_dump -o memory-bug.heap
pprof -http localhost:9999 memory-bug.heap
mem-leak-profile.pb.gz memory-bug.heap.gz
memory-leak-test.yaml.txt --remove the .txt that's a github weirdness
Seems that The TlsCachingDateProviderImpl creates objects in the thread local storage (https://github.com/envoyproxy/envoy/blob/4265b99cfb42516296d95c63da9b7e02990264af/source/common/http/date_provider_impl.cc#L27) that are never deallocated when the xDS server is unavailable. I'm not sure why the objects are not freed, and further investigation is required.
Thanks! I haven't seen one of these with the dotted arrow going up, but if the TimerImpl is properly cleaned up (on xDS fail) that should clean up the TlsCachingDateProviderImpl too, correct?
@cgthayer & @adisuissa - Thanks for investigating this. Are you aware of any plans for fixing this?
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
Hi, thanks. Any plans on fixing this?
At the moment my bandwidth is limited. Setting it as no-stale.
Description: We use the Basic xDS protocol to deliver dynamic configuration to Envoy from a management server. In addition, we have a static configuration with one listener and one cluster. It seems that if Envoy fails to fetch the configuration from the management server, for example, when the management server is down, Envoy's memory usage rises indefinitely (approximately ~1Mb per minute). It only stops when Envoy is able to fetch the configuration.
Some observations we were able to make: 1) This only happens when Envoy is unable to fetch the configuration since startup. 2) Removing the static listener and cluster from the config seems to fix the issue (memory doesn't rise indefinitely).
Repro steps: 1) Deploy envoy with the static configuration specified in the configuration section. There is no need to setup an actual management server. 2) Monitor the memory usage, for example, via the
/memory
endpoint in the admin interface.Config
Logs
``` ':status', '200' 'content-type', 'application/grpc' 'grpc-status', '14' 'grpc-message', 'no healthy upstream' [2023-12-31 15:54:26.116][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] StreamAggregatedResources gRPC config stream to cluster1 closed since 6654s ago: 14, no healthy upstream [2023-12-31 15:54:26.116][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:54:26.116][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:60] Unable to establish new stream to configuration server cluster1 [2023-12-31 15:54:26.116][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:54:26.573][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:26.573][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:54:31.573][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:31.573][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:54:31.786][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:365] dns resolution for control-plane started [2023-12-31 15:54:31.794][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:152] dns resolution for control-plane failed with c-ares status 11 [2023-12-31 15:54:31.799][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:152] dns resolution for control-plane failed with c-ares status 11 [2023-12-31 15:54:31.799][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:285] dns resolution for control-plane completed with status 1 [2023-12-31 15:54:31.799][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:184] DNS refresh rate reset for control-plane, (failure) refresh rate 180000 ms [2023-12-31 15:54:33.622][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:51] Establishing new gRPC bidi stream to cluster1 for rpc StreamAggregatedResources(stream .envoy.service.discovery.v3.DiscoveryRequest) returns (stream .envoy.service.discovery.v3.DiscoveryResponse); [2023-12-31 15:54:33.623][1][debug][router] [source/common/router/router.cc:520] [Tags: "ConnectionId":"0","StreamId":"16089286589495779457"] cluster 'cluster1' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources' [2023-12-31 15:54:33.623][1][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1942] no healthy host for HTTP connection pool [2023-12-31 15:54:33.623][1][debug][http] [source/common/http/async_client_impl.cc:106] async http request response headers (end_stream=true): ':status', '200' 'content-type', 'application/grpc' 'grpc-status', '14' 'grpc-message', 'no healthy upstream' [2023-12-31 15:54:33.623][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] StreamAggregatedResources gRPC config stream to cluster1 closed since 6662s ago: 14, no healthy upstream [2023-12-31 15:54:33.623][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:54:33.623][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:60] Unable to establish new stream to configuration server cluster1 [2023-12-31 15:54:33.623][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:54:36.573][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:36.573][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:54:41.576][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:41.576][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:54:44.949][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:51] Establishing new gRPC bidi stream to cluster1 for rpc StreamAggregatedResources(stream .envoy.service.discovery.v3.DiscoveryRequest) returns (stream .envoy.service.discovery.v3.DiscoveryResponse); [2023-12-31 15:54:44.950][1][debug][router] [source/common/router/router.cc:520] [Tags: "ConnectionId":"0","StreamId":"2604332730402544397"] cluster 'cluster1' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources' [2023-12-31 15:54:44.950][1][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1942] no healthy host for HTTP connection pool [2023-12-31 15:54:44.950][1][debug][http] [source/common/http/async_client_impl.cc:106] async http request response headers (end_stream=true): ':status', '200' 'content-type', 'application/grpc' 'grpc-status', '14' 'grpc-message', 'no healthy upstream' [2023-12-31 15:54:44.950][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] StreamAggregatedResources gRPC config stream to cluster1 closed since 6673s ago: 14, no healthy upstream [2023-12-31 15:54:44.950][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:54:44.950][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:60] Unable to establish new stream to configuration server cluster1 [2023-12-31 15:54:44.950][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:54:46.577][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:46.577][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:54:51.574][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:51.575][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:54:56.575][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:54:56.575][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:01.574][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:01.574][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:06.577][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:06.577][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:09.067][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:51] Establishing new gRPC bidi stream to cluster1 for rpc StreamAggregatedResources(stream .envoy.service.discovery.v3.DiscoveryRequest) returns (stream .envoy.service.discovery.v3.DiscoveryResponse); [2023-12-31 15:55:09.067][1][debug][router] [source/common/router/router.cc:520] [Tags: "ConnectionId":"0","StreamId":"15864330249036367919"] cluster 'cluster1' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources' [2023-12-31 15:55:09.067][1][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1942] no healthy host for HTTP connection pool [2023-12-31 15:55:09.067][1][debug][http] [source/common/http/async_client_impl.cc:106] async http request response headers (end_stream=true): ':status', '200' 'content-type', 'application/grpc' 'grpc-status', '14' 'grpc-message', 'no healthy upstream' [2023-12-31 15:55:09.067][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] StreamAggregatedResources gRPC config stream to cluster1 closed since 6697s ago: 14, no healthy upstream [2023-12-31 15:55:09.067][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:55:09.067][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:60] Unable to establish new stream to configuration server cluster1 [2023-12-31 15:55:09.067][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:55:11.574][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:11.574][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:16.575][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:16.575][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:21.577][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:21.577][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:23.860][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:51] Establishing new gRPC bidi stream to cluster1 for rpc StreamAggregatedResources(stream .envoy.service.discovery.v3.DiscoveryRequest) returns (stream .envoy.service.discovery.v3.DiscoveryResponse); [2023-12-31 15:55:23.860][1][debug][router] [source/common/router/router.cc:520] [Tags: "ConnectionId":"0","StreamId":"11845650912692327212"] cluster 'cluster1' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources' [2023-12-31 15:55:23.860][1][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1942] no healthy host for HTTP connection pool [2023-12-31 15:55:23.860][1][debug][http] [source/common/http/async_client_impl.cc:106] async http request response headers (end_stream=true): ':status', '200' 'content-type', 'application/grpc' 'grpc-status', '14' 'grpc-message', 'no healthy upstream' [2023-12-31 15:55:23.860][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] StreamAggregatedResources gRPC config stream to cluster1 closed since 6712s ago: 14, no healthy upstream [2023-12-31 15:55:23.860][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:55:23.860][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:60] Unable to establish new stream to configuration server cluster1 [2023-12-31 15:55:23.860][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed [2023-12-31 15:55:26.574][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:26.574][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:31.574][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:31.574][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats [2023-12-31 15:55:36.576][1][debug][main] [source/server/server.cc:263] flushing stats [2023-12-31 15:55:36.576][1][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats ```