fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.94k stars 1.34k forks source link

Slow memory leak of Fluentd v0.14 compared to v0.12 #1941

Closed qingling128 closed 6 years ago

qingling128 commented 6 years ago

Fluentd version: 0.14.25 Environment: running inside a debian:stretch-20180312 based container. Dockerfile: here

We noticed a slow memory leak that built up over a month or so. screen shot 2018-04-12 at 5 54 51 pm

The same setup that ran with Fluentd 0.12.41 have stable memory usage over the same period of time.

Still investigating and trying to narrow down versions. But wanna create a ticket to track this.

Config:

    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/k8s-gcp-containers.log.pos
      tag reform.*
      read_from_head true
      format multi_format
      <pattern>
        format json
        time_key time
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </pattern>
      <pattern>
        format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
        time_format %Y-%m-%dT%H:%M:%S.%N%:z
      </pattern>
    </source>

    <filter reform.**>
      @type parser
      format /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<log>.*)/
      reserve_data true
      suppress_parse_error_log true
      emit_invalid_record_to_error false
      key_name log
    </filter>

    <match reform.**>
      @type record_reformer
      enable_ruby true
      <record>
        # Extract local_resource_id from tag for 'k8s_container' monitored
        # resource. The format is:
        # 'k8s_container.<namespace_name>.<pod_name>.<container_name>'.
        "logging.googleapis.com/local_resource_id" ${"k8s_container.#{tag_suffix[4].rpartition('.')[0].split('_')[1]}.#{tag_suffix[4].rpartition('.')[0].split('_')[0]}.#{tag_suffix[4].rpartition('.')[0].split('_')[2].rpartition('-')[0]}"}
        # Rename the field 'log' to a more generic field 'message'. This way the
        # fluent-plugin-google-cloud knows to flatten the field as textPayload
        # instead of jsonPayload after extracting 'time', 'severity' and
        # 'stream' from the record.
        message ${record['log']}
      </record>
      tag ${if record['stream'] == 'stderr' then 'stderr' else 'stdout' end}
      remove_keys stream,log
    </match>

    <match fluent.**>
      @type null
    </match>

    # This section is exclusive for k8s_container logs. These logs come with
    # 'stderr'/'stdout' tags.
    # We use a separate output stanza for 'k8s_node' logs with a smaller buffer
    # because node logs are less important than user's container logs.
    <match {stderr,stdout}>
      @type google_cloud

      # Try to detect JSON formatted log entries.
      detect_json true
      # Collect metrics in Prometheus registry about plugin activity.
      enable_monitoring true
      monitoring_type prometheus
      # Allow log entries from multiple containers to be sent in the same request.
      split_logs_by_tag false
      # Set the buffer type to file to improve the reliability and reduce the memory consumption
      buffer_type file
      buffer_path /var/log/k8s-fluentd-buffers/kubernetes.containers.buffer
      # Set queue_full action to block because we want to pause gracefully
      # in case of the off-the-limits load instead of throwing an exception
      buffer_queue_full_action block
      # Set the chunk limit conservatively to avoid exceeding the recommended
      # chunk size of 5MB per write request.
      buffer_chunk_limit 1M
      # Cap the combined memory usage of this buffer and the one below to
      # 1MiB/chunk * (6 + 2) chunks = 8 MiB
      buffer_queue_limit 6
      # Never wait more than 5 seconds before flushing logs in the non-error case.
      flush_interval 5s
      # Never wait longer than 30 seconds between retries.
      max_retry_wait 30
      # Disable the limit on the number of retries (retry forever).
      disable_retry_limit
      # Use multiple threads for processing.
      num_threads 2
      use_grpc false
      # Use Metadata Agent to get monitored resource.
      enable_metadata_agent true
    </match>
repeatedly commented 6 years ago

Hmm... weird. Does this memory leak depend on traffic? And I want to know this also happens with latest fluentd v1.1.3 or not.

qingling128 commented 6 years ago

@repeatedly - We are setting up some soak test for various Fluentd versions (v0.12, v0.14 and v1.1). Will keep you posted once we have some numbers for the latest version. Hope it would be fixed in the latest version (there seems to be some fixes in between those versions).

cosmok commented 6 years ago

I am on fluentd-1.0.2 and am experiencing the same i.e memory usage keeps growing up. Restart reclaims memory.

I am just tailing many files and sending to an aggregator instance.

I couldn't find anything interesting in dentry cache or in perf report, but, I will try the GC options as suggested here:

https://docs.fluentd.org/v1.0/articles/performance-tuning-single-process#reduce-memory-usage

cosmok commented 6 years ago

RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 had no effect:

image

Above is the the memory usage used by Ruby/Fluentd after the GC change.

There is a spike just after 3AM, interestingly this is when we move older files out of the path tailed by fluentd.

Will setting 'open_on_every_update' in the in_tail plugin help at all?

repeatedly commented 6 years ago

@cosmok That's interesting. How many files do you watch and could you paste your configuration to reproduce the problem?

cosmok commented 6 years ago

@repeatedly At the moment, close to 30000 files are being watched (distributed across 4 fluentd processes - using the the multi worker plugin), each file is around 9 KB.

Config (simplified and scrubbed to remove sensitive details)

Forwarder config (this is where the problem is) ``` @type tail path /home/u/*/detail-* pos_file /home/u/w1.log.pos tag usage.all multiline_flush_interval 2s enable_watch_timer false read_from_head skip_refresh_on_startup format multiline format_firstline /^([a-zA-Z]{3} [a-zA-Z]{3}\s+[\d]{1,2} [\d]{2}:[\d]{2}:[\d]{2} [\d]{4})\n/ format1 /^(?[a-zA-Z]{3} [a-zA-Z]{3}\s+[\d]{1,2} [\d]{2}:[\d]{2}:[\d]{2} [\d]{4})\n/ format2 /(.*)?Session-Id = "(?[^ \n]+)"\n/ ... more lines of regex capture #ignore unmatched lines @log_level error @type forward @log_level error require_ack_response true name aggregator host my.aggregatorinstance.com weight 100 slow_flush_log_threshold 2.0 expire_dns_cache 60 heartbeat_type none buffer_type file buffer_chunk_limit 2m buffer_queue_limit 1024 buffer_path /var/log/td-agent/buffer-1 retry_wait 1m retry_limit 17 disable_retry_limit false flush_interval 5s num_threads 3 ```
rubencabrera commented 6 years ago

I think I'm also hitting this issue. Using a container based on fluentd-kubernetes-daemonset tag v1.2-debian-elasticsearch which has some more plugins on top.

We have one of this containers in each node for a kubernetes cluster. We were hitting some Buffer Overflow issues.

These are the last values values regarding buffer plugin in the conf (it's 0.x syntax but I've checked it gets converted correctly):

...
  request_timeout 10s
  buffer_type file
  buffer_path /var/lib/fluentd/buffer/
  buffer_chunk_limit 64m
  buffer_queue_limit 20
  buffer_queue_full_action drop_oldest_chunk
  flush_interval 5s
  heartbeat_interval 1
  num_threads 2
...

I've played with most of the values, the above ones are the first that gave me stability in some of the pods, combined with setting RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to 0.9.

The pods leaking memory seem to start doing so after logging messages about buffer overflow:

{"error":"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>","location":"/fluentd/vendor/bundle/ruby/2.3.0/gems/fluentd-1.2.2/lib/fluent/plugin/buffer.rb:269:in `write'","tag":"kubernetes.var.log.containers.weave-scope-agent-fk2kn_kube-monitoring_agent-3a7bad7323da6331ecd1a214fa31ff727bf9071bce22e05054c8e0fc8c325d50.log","message":"emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\" location=\"/fluentd/vendor/bundle/ruby/2.3.0/gems/fluentd-1.2.2/lib/fluent/plugin/buffer.rb:269:in `write'\" tag=\"kubernetes.var.log.containers.weave-scope-agent-fk2kn_kube-monitoring_agent-3a7bad7323da6331ecd1a214fa31ff727bf9071bce22e05054c8e0fc8c325d50.log\"","time":"2018-06-29T11:57:56+00:00"}
{"action":"drop_oldest_chunk","message":"failed to write data into buffer by buffer overflow action=:drop_oldest_chunk","time":"2018-06-29T11:57:56+00:00"}
{"chunk_id":"56fb88ad9bc0721711e7402bb1f53f36","message":"dropping oldest chunk to make space after buffer overflow chunk_id=\"56fb88ad9bc0721711e7402bb1f53f36\"","time":"2018-06-29T11:57:56+00:00"}
2018-06-29 11:57:56 +0000 [warn]: #0 failed to write data into buffer by buffer overflow action=:drop_oldest_chunk
2018-06-29 11:57:56 +0000 [warn]: #0 dropping oldest chunk to make space after buffer overflow chunk_id="56fb88ae1404d5e0ee2c411a5b1d0917"
2018-06-29 11:57:56 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/fluentd/vendor/bundle/ruby/2.3.0/gems/fluentd-1.2.2/lib/fluent/plugin/buffer.rb:269:in `write'" tag="kubernetes.var.log.containers.container-log-ingest-4nhwz_kube-extra_ingest-35047c023d00d45626eadd72378c5562f459e22843a96f3bae65449861f26b46.log"

The above happens for 1674 lines and after all that is logged, the memory leak shows up. I'm repeating the tests to see if this is consistent, but the memory leaked for almost 50 minutes and then started to decrease memory usage at the same pace it was increasing during the leak (about 7MiB per minute).

Please let me know if there's any more information I could provide. It's being a bit difficult for me to reproduce, it only happens in our most logging-busy cluster, but I thought the timing of the buffer overflow and the memory leak would be worth mentioning.

Here's a picture of the monitoring of one of this containers. The drop in CPU happens when the memory starts to grow, and the logs show that this was the moment the Buffer Overflow happened. The memory starts to decrease when the Network I/O suddenly becomes way lower. The prometheus plug in exposes an endpoint that we use for liveness probe and it remained live for all this monitored time.

log-ingest-monitor

repeatedly commented 6 years ago

@rubencabrera Thanks for the detailed report. I'm now re-working this issue and try to reproduce the problem with your configuration.

rubencabrera commented 6 years ago

Thanks @repeatedly

The report above was done in the middle of an upgrade to the version of fluentd and all the plugins we use. The problems we had with buffering seem solved now and we don't see the containers getting restarted so often (so I presume the memory issue is under control in this scenario).

I can only have a long running window in that cluster on weekends and this one I'll be out, but I could try again to remove the resource limit to see what happens. If the leak doesn't occur, maybe I could try to get the buffer error back to see if that's the problem. If you have any other ideas that could help, please let me know.

qingling128 commented 6 years ago

@repeatedly - We just tried the latest v1.2.3 (comparing with 0.14.25 and 0.12.41). Seems the memory climbing issue is not getting better in the latest version.

Comparison among v1.2.3, v0.14.25 and v0.12.41 over 5-day growth: Blue: v1.2.3 (not much better either. It just started later). Green: v0.14.25 Red: v0.12.41 comparison

Comparison among v0.14.25 and v0.12.41 over 19-day growth: Red: v0.14.25 Blue: v0.12.41 comparison2

It almost felt like GC kicked in really late and did not bring the memory allocation down sufficiently. So we also tried setting RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to 0.9, 1.2, 1.5 and 1.8 for both v0.14.25 and v1.2.3. That did not help slowing down the memory climbing either though. 3

Any luck with the reproduction?

qingling128 commented 6 years ago

BTW, we were handling 100kb/s logs in the initial comment. In the last comment we are handling 175kb/s logs. We have been stuck on Fluentd v0.12 for this issue.

tahajahangir commented 6 years ago

This is memory utilization of fluentd (running as a docker container) for 2-month (There is ~150 log-rows per second in last month) screenshot from 2018-07-21 14-57-40

repeatedly commented 6 years ago

@qingling128 Thanks. Do you use v0.12 API with v1.x, right?

@tahajahangir Do you use fluentd v1.x serise?

tahajahangir commented 6 years ago

@repeatedly

# fluentd --version
fluentd 1.0.2
# ruby -v
ruby 2.3.3p222 (2016-11-21) [x86_64-linux-gnu]

We are going to test 1.2.2 next week.

xenera-zhangtao commented 6 years ago

I am experiencing the same problem. memory usage keeps growing up. image

Environment: amazon linux 2 Fluentd version: starting fluentd-1.2.2 pid=1 ruby="2.4.4"

lee2014 commented 6 years ago

When I use jemalloc for ruby, the problem of memory leak disappears.

#!/bin/bash

export LD_PRELOAD=/usr/lib64/libjemalloc.so.1
export RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9

fluentd -c ./fluent.conf -o ./fluent.log -d ./fluent.pid

Are there some bugs in C extensions?

qingling128 commented 6 years ago

@repeatedly - We are using https://docs.fluentd.org/v1.0/articles/api-plugin-output. The page said This page is simply copied from v0.12 documents, and will be updated later.. Looking at the content, I guess we should be using v1.X API under the hood as long as we are using Fluentd v1.X (correct me if I'm wrong).

In fact, I set up some stub gem and was still seeing the memory leak.

The stub gem is just sleeping 0.06 seconds for each chunk and not doing anything else:

require 'time'

module Fluent
  # fluentd output plugin for the Stackdriver Logging API
  class GoogleCloudOutput < BufferedOutput

    Fluent::Plugin.register_output('google_cloud', self)

    def write(chunk)
      sleep(0.06)
    end

    def format(tag, time, record)
      Fluent::Engine.msgpack_factory.packer.write([tag, time, record]).to_s
    end
  end
end

Memory usage image

The LD_PRELOAD var is interesting. We do set it in the package and install the package inside a container. I'll double check that it's also properly set in the container environment.

lee2014 commented 6 years ago

@qingling128 I don't think that is memory leak, looks like so many unreleased memory fragments.

And the problem has been fixed, when I set the LD_PRELOAD for libjemalloc or use a new ruby package compiling with option --with-jemalloc.

You can refer to https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html.

repeatedly commented 6 years ago

I assume qingling128 uses jemalloc properly. I'm now investigating the problem with qingling128's stub code.

qingling128 commented 6 years ago

@lee2014 @repeatedly - Yeah, We were using jemalloc. And I have added a step to explicitly enforce that in the container in case the package did not set it up properly. Still getting similar results.

image

qingling128 commented 6 years ago

BTW, our configuration uses a combination of tail, parser, systemd, record_reformer, detect_exceptions plugins. The full configuration can be found at below. I'm setting up another experiment to trim that down to just using a simple tail plugin (so that we can rule out some noises). Will update with collected data.

k8s_stackdriver_logging_agent.conf.txt

repeatedly commented 6 years ago

If you have a time, could you use in_dummy instead of in_tail for testing? On my ubuntu 16.04, in_dummy and your stub out_google_cloud doesn't increase RSS with jemalloc 4.5.0. I want someone check in_tail with lots of files has a problem or not.

<source>
  @type dummy
  @id test_in
  rate 5000 # Use actual message rate
  dummy {your actual json here}
  tag test
</source>

<match test>
  @type google_cloud
  @id test_out

  buffer_type file
  buffer_path /path/to/buffer/leak
  buffer_queue_full_action block
  buffer_chunk_limit 1M
  buffer_queue_limit 6
  flush_interval 5s
  max_retry_wait 30
  disable_retry_limit
  num_threads 2
</match>
$ LD_PRELOAD=/path/to/jemalloc/4.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 RUBYLIB=~/fluentd/lib  ~/fluentd/bin/fluentd -c leak.conf -p .
qingling128 commented 6 years ago

Sure, I'll give in_dummy a try. BTW, I tried two configuration settings below. One triggered memory issue while the other did not.

Good one (with only in_tail plugin) good.txt

Bad one (in_tail plugin + multi_format, parser, record_reformer and record_transformer plugins) bad.txt

RSS usage image

I just set up a few more experiments of some combination of multi_format, parser, record_reformer and record_transformer plugins. Hopefully that would narrow down the culprit.

qingling128 commented 6 years ago

In fact, scratch https://github.com/fluent/fluentd/issues/1941#issuecomment-409652705. The "good" one did not have log generator properly set up. I just fixed the log generator (shown as 2trim). The memory issue seems reproducible with just in_tail plugin with this configuration based on the trend.

I also set up a dummy experiment with in_dummy plugin.

Will let it soak a bit and update the thread.

image

tuasowro commented 6 years ago

I'm also heading an issue, i'm using 2 instances of fluentd in kubernetes and each of them consume 1gb memory

qingling128 commented 6 years ago

@repeatedly Seems like the memory issue went away after I replaced in_tail with in_dummy.

    <source>
      @type dummy
      tag dummy_test
      rate 1000
      dummy {"message":"worldworldworldworldworldworldworldworldworldworldworldworldworldworldworldworldworldworldworldworld"}
    </source>

image

repeatedly commented 6 years ago

@qingling128 Ah, good.

log generator

Could you share this script? I want to debug in_tail plugin on same setup.

qingling128 commented 6 years ago

Sure.

The script inside the log generator container we set up is: log_generator.txt

The dockerfile is: Dockerfile.txt

We are running this container and print to stdout. Kubernetes log stdout to /var/log/containers/*** and Fluentd tails from there. BTW we were using python log_generator.py --log-size-in-bytes=100 --log_rate=1000.

If you are not using Kubernetes, you could simply change the print statement to write to a log file instead.

qingling128 commented 6 years ago

Hi @repeatedly, any luck reproducing this?

repeatedly commented 6 years ago

@qingling128 I am running your log generator with stub code but still no luck on my Ubuntu environment. How many files to you tail? I'm now testing with 9 files (9000msg/sec).

qingling128 commented 6 years ago

We are tailing 30 files, each with 30 msg/sec.

qingling128 commented 6 years ago

By the way, I just started some simple setup in order to provide some reproduction steps without getting deeply involved in our infrastructure: https://github.com/qingling128/fluent-plugin-buffer-output-stub/tree/master.

I only just set up an experiment. Will let it run overnight to see if this can reproduce the issue. The configuration and setup are similar but not exactly the same with what we have in Kubernetes. But I will iterate on this to try to reproduce it.

repeatedly commented 6 years ago

We are tailing 30 files, each with 30 msg/sec.

Thanks. I will try it with similar setting.

repeatedly commented 6 years ago

Does anyone reproduce this problem on their environment without Docker/k8s? I run tests on my Mac and Ubuntu for several days but not reproduce the problem. So I want to know this problem happens on only Docker environment or not.

Here is my environment:

require 'parallel' # 'parallel' gem

nums = (0...30).to_a
Parallel.map(nums, in_threads: nums.size) { |n|
  `python log_generator.py --log-size-in-bytes=100 --log-rate=100 >> log/data#{n}.log`
}

sleep
<source>
  @type tail
  path ./log/*.log
  pos_file ./pos
  format json
  read_from_head true
  tag test
</source>

<match test>
  @type test # This plugin is same with https://github.com/qingling128/fluent-plugin-buffer-output-stub/tree/master
  @id test_out

  buffer_type file
  buffer_path /path/to/buffer/test
  buffer_queue_full_action block
  buffer_chunk_limit 1M
  buffer_queue_limit 6
  flush_interval 5s
  max_retry_wait 30
  disable_retry_limit
  num_threads 2
</match>

This is based on above. Just changed self._log_record line to self._log_record = '{{"stream":"stdout","log":"{}"}}'.format(self._random_string(log_size_in_bytes)) for json format in in_tail

fluentd: 1.2.4 ruby: 2.4.4 command: RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 RUBYLIB=~/dev/fluentd/fluentd/lib ~/dev/fluentd/fluentd/bin/fluentd -c f.conf -p plugin

qingling128 commented 6 years ago

I also tried to reproduce it in a non-docker & non-k8s environment with similar configurations. No luck yet.

cosmok commented 6 years ago

Hi @repeatedly @qingling128 ,

I am using neither docker or k8s. My config is as per my comment here: https://github.com/fluent/fluentd/issues/1941#issuecomment-382974082

qingling128 commented 6 years ago

Just a thought, would log rotation contribute to the issue? As I thought about the difference between the two setups (k8s v.s. no k8s), this is the first thing that crossed my mind.

Current GKE log rotation happens when log file exceeds 10MB. At the load of 100kb/s, the log file is rotated every (10 * 1024 / 100 = 102) seconds.

repeatedly commented 6 years ago

I see. I will run rotation script with par.rb and observe memory usage.

repeatedly commented 6 years ago

I noticed timer for rotated files are not released. Will fix and observe memory usage.

qingling128 commented 6 years ago

Sounds promising. Keep us posted. :D Thanks a lot!

repeatedly commented 6 years ago

patch is here: https://github.com/fluent/fluentd/pull/2105

qingling128 commented 6 years ago

Great! Let me know if there is anything I could help test. :)

repeatedly commented 6 years ago

I released v1.2.5.rc1 for testing. You can install this version with --pre option in gem install

qingling128 commented 6 years ago

Great. I've set up some test for that version. Will keep you posted.

qingling128 commented 6 years ago

Seems like that patch fixed the issue we had. Thank you so much @repeatedly !

image

Gem versions we tested with:

$ kubectl -n stackdriver-agents exec stackdriver-logging-agent-s5sqr ls /opt/google-fluentd/embedded/lib/ruby/gems/2.4.0/gems | grep fluent
fluent-logger-0.7.2
fluent-mixin-config-placeholders-0.4.0
fluent-mixin-plaintextformatter-0.2.6
fluent-plugin-detect-exceptions-0.0.10
fluent-plugin-google-cloud-0.6.23
fluent-plugin-mongo-0.7.13
fluent-plugin-multi-format-parser-0.1.1
fluent-plugin-prometheus-0.3.0
fluent-plugin-record-reformer-0.9.1
fluent-plugin-rewrite-tag-filter-1.5.5
fluent-plugin-s3-0.8.4
fluent-plugin-scribe-0.10.14
fluent-plugin-systemd-0.3.0
fluent-plugin-td-0.10.28
fluent-plugin-td-monitoring-0.2.2
fluent-plugin-webhdfs-0.4.2
fluentd-1.2.5.rc1
qingling128 commented 6 years ago

BTW, when are we expecting a formal release of fluentd-1.2.5?

repeatedly commented 6 years ago

Released v1.2.5. Thanks for the testing.

qingling128 commented 6 years ago

Thank you!