Open Mosibi opened 3 years ago
It seems related to #3342.
FYI: Recently, fluent-plugin-systemd 1.0.5 was released.
It fixes "Plug a memory leaks on every reload" https://github.com/fluent-plugin-systemd/fluent-plugin-systemd/pull/91
Does it reproducible even though fluent-plugin-systemd 1.0.5?
I tested with the complete config and then I still see the memory leak. Today I will run with only the systemd part to see if there is a change.
I'm investigating similar issue from few days, and observed that for fluentd 1.11.5
using null
output with file buffer also constantly takes more memory in opposition to null
output without buffer section. I can reply tests for newer fluentd
I'm investigating similar issue from few days, and observed that for fluentd
1.11.5
usingnull
output with file buffer also constantly takes more memory in opposition tonull
output without buffer section. I can reply tests for newer fluentd
Ah, that is good information. Please share the results with the latest fluentd version, if you can.
Is it possible to share the configuration you are using?
This is my actual config
2021-06-01 12:30:47 +0000 [info]: using configuration file: <ROOT>
<match fluentd.pod.healthcheck>
@type relabel
@label @FLUENT_LOG
</match>
<label @FLUENT_LOG>
<match **>
@type null
</match>
</label>
<system>
log_level info
</system>
<source>
@type forward
port 24321
bind "0.0.0.0"
</source>
<match containers.**>
@type relabel
@label @NORMAL
</match>
<label @NORMAL>
<match containers.**>
@type copy
<store>
@type "null"
<buffer>
@type "file"
path "/fluentd/buffer/logs.containers"
compress gzip
flush_interval 5s
flush_thread_count 8
chunk_limit_size 1m
total_limit_size 128m
queued_chunks_limit_size 128
overflow_action drop_oldest_chunk
retry_max_interval 10m
retry_forever true
</buffer>
</store>
</match>
</label>
</ROOT>
It seems related to #3342.
FYI: Recently, fluent-plugin-systemd 1.0.5 was released.
It fixes "Plug a memory leaks on every reload" fluent-plugin-systemd/fluent-plugin-systemd#91
Does it reproducible even though fluent-plugin-systemd 1.0.5?
Yes, also with fluent-plugin-systemd version 1.0.5 I see the same behaviour. I can attach a new graph, but it shows exactly the same as the first one.
Thanks to @sumo-drosiek, I can confirm that no memory leak occurs when I do not use a buffer section in the output. I rerun everything using a new config with a buffer section to be 100% sure.
The config without buffers
<system>
log_level info
log_event_verbose true
root_dir /tmp/fluentd-buffers/
</system>
<label @OUTPUT>
<match **>
@type null
</match>
</label>
###
# systemd/journald
###
<source>
@type systemd
tag systemd
path /var/log/journal
read_from_head true
<entry>
fields_strip_underscores true
fields_lowercase true
</entry>
@label @OUTPUT
</source>
I've confirmed the issue. I think it's not fluentd's issue, it's fluent-plugin-systemd's issue. Please forward the report to https://github.com/fluent-plugin-systemd/fluent-plugin-systemd
Details:
<source>
with other plugins like in_tail (with tailing a huge file).read_from_head
of in_systemd. When it's specified, in_systemd tries to read whole entries by running a busy loop: https://github.com/fluent-plugin-systemd/fluent-plugin-systemd/blob/9ffe8e14de75ca94e9bfe7428efd5c1a59421511/lib/fluent/plugin/in_systemd.rb#L144
diff --git a/lib/fluent/plugin/in_systemd.rb b/lib/fluent/plugin/in_systemd.rb
index adb8b3f..66f4373 100644
--- a/lib/fluent/plugin/in_systemd.rb
+++ b/lib/fluent/plugin/in_systemd.rb
@@ -120,6 +120,7 @@ module Fluent
init_journal if @journal.wait(0) == :invalidate
watch do |entry|
emit(entry)
@ashie I just wanted to add the results of my test run with buffering configured in an output where I see that without buffering there is no memory leak and with buffering a memory leak occurs
So to be complete:
No memory leak:
<label @OUTPUT>
<match **>
@type null
</match>
</label>
Memory leak:
<label @OUTPUT>
<match **>
@type null
<buffer>
@type "file"
path "/var/log/fluentd-buffers/output.buffer"
compress gzip
flush_interval 5s
flush_thread_count 8
chunk_limit_size 1m
total_limit_size 128m
queued_chunks_limit_size 128
overflow_action drop_oldest_chunk
retry_max_interval 10m
retry_forever true
</buffer>
</match>
</label>
Thanks, further investigation may be needed at fluentd side. Reopen.
The following graph displays the memory over almost 7 days. This is my regular config where only the buffer sections in the Elasticsearch outputs is disabled. I was already sure that it was the buffer section, this is just more proof :)
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
@ashie any update on this?
I am also facing this issue. Support has suggested to setup a CRON to restart service until a resolution comes: https://stackoverflow.com/questions/69161295/gcp-monitoring-agent-increasing-memory-usage-continuously.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
The issue is still here. Please unstall .
Can it be connected with #3634?
Any update for this?
I faced this issue with file buffer
on output config
for me it was a salvation in setting
prefer_oj_serializer true
http_backend typhoeus
buffer_type memory
for elasticsearch match section.
And the aggregator pod is using all the ram which you give to the limit until ram release the ram. also I added
extraEnvVars:
# https://brandonhilkert.com/blog/reducing-sidekiq-memory-usage-with-jemalloc/?utm_source=reddit&utm_medium=social&utm_campaign=jemalloc
- name: LD_PRELOAD
value: /usr/lib/x86_64-linux-gnu/libjemalloc.so
# https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html
- name: MALLOC_ARENA_MAX
value: "2"
This helped to make this usable, but I still have no idea why it is that greedy with the ram
(by the way, forwarders are acting normal)
ofc I had to use manually set elasticsearch version to 7.17
I wasted tons of time on it and I'll be happy if some hints from these thoughts would be useful
# https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html - name: MALLOC_ARENA_MAX value: "2"
When you use jemalloc, MALLOC_ARENA_MAX
doesn't make sense, so you can remove it.
I face memory leak as well I use input:
forward
http
for the liveness and readiness probessystemd
output:
null
for fluent logsopensearch
with file bufferstdout
for health checks.The file buffer is always flushed but memory is getting increased.
Version:
activesupport (7.0.2.3)
addressable (2.8.0)
aws-eventstream (1.2.0)
aws-partitions (1.579.0)
aws-sdk-core (3.130.2)
aws-sdk-kms (1.56.0)
aws-sdk-s3 (1.113.1)
aws-sdk-sqs (1.51.0)
aws-sigv4 (1.5.0)
benchmark (default: 0.1.0)
bigdecimal (default: 2.0.0)
bundler (2.3.12, 2.3.11)
cgi (default: 0.1.0.1)
concurrent-ruby (1.1.10)
cool.io (1.7.1)
csv (default: 3.1.2)
date (default: 3.0.3)
delegate (default: 0.1.0)
did_you_mean (default: 1.4.0)
digest-crc (0.6.4)
domain_name (0.5.20190701)
elastic-transport (8.0.0)
elasticsearch (8.1.2)
elasticsearch-api (8.1.2)
elasticsearch-xpack (7.17.1)
etc (default: 1.1.0)
excon (0.92.2)
faraday (1.10.0)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-multipart (1.0.3)
faraday-net_http (1.0.1)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday-retry (1.0.3)
faraday_middleware-aws-sigv4 (0.6.1)
fcntl (default: 1.0.0)
ffi (1.15.5)
ffi-compiler (1.0.1)
fiddle (default: 1.0.0)
fileutils (default: 1.4.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-concat (2.5.0)
fluent-plugin-dedot_filter (1.0.0)
fluent-plugin-detect-exceptions (0.0.14)
fluent-plugin-elasticsearch (5.2.2)
fluent-plugin-grafana-loki (1.2.18)
fluent-plugin-kafka (0.17.5)
fluent-plugin-kubernetes_metadata_filter (2.10.0)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-opensearch (1.0.2)
fluent-plugin-prometheus (2.0.2)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-s3 (1.6.1)
fluent-plugin-systemd (1.0.5)
fluentd (1.14.6)
forwardable (default: 1.3.1)
getoptlong (default: 0.1.0)
http (4.4.1)
http-accept (1.7.0)
http-cookie (1.0.4)
http-form_data (2.3.0)
http-parser (1.2.3)
http_parser.rb (0.8.0)
i18n (1.10.0)
io-console (default: 0.5.6)
ipaddr (default: 1.2.2)
irb (default: 1.2.6)
jmespath (1.6.1)
json (default: 2.3.0, 2.1.0)
jsonpath (1.1.2)
kubeclient (4.9.3)
logger (default: 1.4.2)
lru_redux (1.1.0)
ltsv (0.1.2)
matrix (default: 0.2.0)
mime-types (3.4.1)
mime-types-data (3.2022.0105)
minitest (5.15.0, 5.13.0)
msgpack (1.5.1)
multi_json (1.15.0)
multipart-post (2.1.1)
mutex_m (default: 0.1.0)
net-pop (default: 0.1.0)
net-smtp (default: 0.1.0)
net-telnet (0.2.0)
netrc (0.11.0)
observer (default: 0.1.0)
oj (3.3.10)
open3 (default: 0.1.0)
opensearch-api (1.0.0)
opensearch-ruby (1.0.0)
opensearch-transport (1.0.0)
openssl (default: 2.1.3)
ostruct (default: 0.2.0)
power_assert (1.1.7)
prime (default: 0.1.1)
prometheus-client (4.0.0)
pstore (default: 0.1.0)
psych (default: 3.1.0)
public_suffix (4.0.7)
racc (default: 1.4.16)
rake (13.0.6, 13.0.1)
rdoc (default: 6.2.1.1)
readline (default: 0.0.2)
readline-ext (default: 0.1.0)
recursive-open-struct (1.1.3)
reline (default: 0.1.5)
rest-client (2.1.0)
rexml (default: 3.2.3.1)
rss (default: 0.2.8)
ruby-kafka (1.4.0)
ruby2_keywords (0.0.5)
rubygems-update (3.3.11)
sdbm (default: 1.0.0)
serverengine (2.2.5)
sigdump (0.2.4)
singleton (default: 0.1.0)
stringio (default: 0.1.0)
strptime (0.2.5)
strscan (default: 1.0.3)
systemd-journal (1.4.2)
test-unit (3.3.4)
timeout (default: 0.1.0)
tracer (default: 0.1.0)
tzinfo (2.0.4)
tzinfo-data (1.2022.1)
unf (0.1.4)
unf_ext (0.0.8.1)
uri (default: 0.10.0)
webrick (1.7.0, default: 1.6.1)
xmlrpc (0.3.0)
yajl-ruby (1.4.2)
yaml (default: 0.1.0)
zlib (default: 1.1.0)
I am also seeing a memory leak. Here is our Gemfile.lock and here is a 30 day graph showing the memory leak over several daemonset restarts. Each distinct color is one instance of fluentd servicing a single node:
After upgrading fluentd v1.7.4 to v1.14.6, the memory usage has been spiked from 400~500Mi to about 4Gi. And it trend keep growing slowly but infinitely day by day, even though there is NO configuration and workload changes. The following graph was regarding several hours.
Using plugin list:
output
input
Is there any updates to affect memory usage between the both versions ?
@Mosibi
I have the same issue, I don't know if you have checked this or not, but seems to be a Kubernetes reporting issue, its reporting wrong metrics for memory, If you can shell into the pod and run free
I think you will see the free memory is more than what is being reported by Kubernetes metrics itself.
Kubernetes metrics is reporting memory usage including cache as well, may be thats the issue.
@madanrishi do you have a link to the kubernetes issue that you think this might be?
Any update on this issue guys? I am facing a similar issue
@madanrishi do you have a link to the kubernetes issue that you think this might be? @danielhoherd My bad on this, on further investigation this is not the case, still have the same issue, statefulset restart fixes the issue but thats not an ideal resolution
Hi,
We have a memory leak in the following situation:
Changing the log_level to not log non-matching pattern messages fixes the memory leak (e.g. changing log_level to error)
Environment:
Facing the same issue, observed on cloudwatch_logs, log_analytics and logzio-0.0.22 plugins, slack plugin, memory consumption of each process keeps rising from hundred megabytes to several gigabytes in a few days.
td-agent 4.4.1 fluentd 1.13.3 (https://github.com/fluent/fluentd/commit/c32842297ed2c306f1b841a8f6e55bdd0f1cb27f)
My colleague Lester and I found out an issue related to this memory issue: https://github.com/fluent/fluentd/issues/4174
Describe the bug
I noticed a memory issue, what looks to me as a leak. I had a more complex configuration, but boiled it down to this simple config just to rule out the specifics in my config. Even with this (see below) config, I observe the memory leakage.
I run fluentd 1.12.3 from a container on a Kubernetes cluster via Helm
To Reproduce Deploy Fluentd 1.12.3 using Helm on a Kubernetes cluster using the provided config. Let it run for some time and observe the memory metrics
Expected behavior No memory leak
Your Environment Fluentd 1.12.3 on Kubernetes 1.17.7
Your Configuration
Additional context Using my more complex configuration I observe the same memory growth but there seems to be a correlation between the growth in size and the number of logs ingested. In other words, using more inputs (in my case, container logs), the (assumed) memory leak is larger.