Fluentd retains excessive amounts of memory after handling traffic peaks

joell commented 7 years ago

When setting up a simple, 2-part Fluentd (TCP -> forwarder) -> (forwardee -> disk configuration) and giving it 5 million JSON objects to process all at once, resident-set memory consumption jumps from an inital 30MB to between 200-450MB, and does not come back down after computation is complete. This is observed using version 2.3.5-1.el7 of the TD Agent RPM package running on CentOS 7. (The version of Fluentd in that package is 0.12.36.)

Steps to reproduce:

# define and start a simple TCP forwarder
$ cat > test-in.conf <<'EOF'
<source>
  @type  tcp
  tag    testing
  format json
  port   10130
</source>

<match **>
  @type forward
  require_ack_response true
  flush_interval 10s

  <server>
    host 127.0.0.1
    port 10131
  </server>
</match>
EOF
$ td-agent --no-supervisor -c test-in.conf &

# define and start a simple forwadee that logs to disk
$ cat > test-out.conf <<'EOF'
<source>
   @type forward
   port  10131
</source>

<match **>
  @type              file
  format             json
  path               "/tmp/fluent-test"
  flush_interval     10s
  buffer_chunk_limit 16m
  compress           gzip
</match>
EOF
$ td-agent --no-supervisor -c test-out.conf &

# observe initial memory consumption
$ ps -o pid,vsz,rss,cmd | grep '[t]d-'
 4254 431724 30544 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-out.conf
 4259 498816 30052 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-in.conf

# pump a bunch of data through it
$ seq 1 5000000 | sed 's/.*/{"num": &, "filler": "this is filler text to make the event larger"}/' > /dev/tcp/localhost/10130 &

# wait for all data to flush to disk and CPU to return to idle
$ watch -n1 'ls -lt /tmp/fluent-test.* | head -5'

# observe final memory consumption
$ ps -o pid,rss,cmd | grep '[t]d-'
 4338 628248 288972 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-out.conf
 4343 1023868 461616 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-in.conf

As you can see from the RSS numbers, each td-agent process started out around 30MB and they ended at ~290MB and ~460MB, respectively. Neither process will release that memory if you wait a while. (In the real-world staging system we initially discovered this on, memory consumption of the test-out.conf-equivalent configuration reached over 3GB, and the test-in.conf-equivalent was a Fluent Bit instance exhibiting a recently-fixed duplication bug.)

Reviewing a Fluentd-related kubernetes issue during our own diagnostics, we noticed that the behavior we observed seemed similar to the Fluentd behavior described there when built without jemalloc. This led us to check if the td-agent binary we were using was in fact linked with jemalloc. According to the FAQ, jemalloc is used when building the Treasure Data RPMs, and though we found jemalloc libraries installed on the system, we couldn't find any existence of jemalloc in the process running in memory. Specifically, we tried the following things:

# given a td-agent process with PID 4343...

# no jemalloc shared library mentioned in the memory mapping
$ pmap 4343 | grep jemalloc
# ... so it doesn't look like it's dynamically linked

# grab the entire memory space and search it for references to jemalloc
$ gcore 4343
$ strings core.4343 | fgrep jemalloc

# ... but if it were statically linked, you'd expect to find some of these strings
$ strings /opt/td-agent/embedded/lib/libjemalloc.a | fgrep jemalloc

In short, this leads us to wonder... are the binaries invoked by td-agent actually linked with jemalloc? If they are not, is the old memory fragmentation problem that jemalloc solved what we are observing here? (And if they aren't, am I raising this issue in the wrong place, and if so where should I raise it?)

repeatedly commented 7 years ago

Did you try this setting?

https://docs.fluentd.org/v0.12/articles/performance-tuning#reduce-memory-usage

joell commented 7 years ago

Yes, even with constraints on the oldobject factor, the problem persists. In fact, even with more draconion restrictions on the garbage collector the problem persists, e.g.:

$ RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 \
  RUBY_GC_HEAP_GROWTH_FACTOR=1.05         \
  RUBY_GC_MALLOC_LIMIT_MAX=16777216       \
  RUBY_GC_OLDMALLOC_LIMIT_MAX=16777216    \
  td-agent --no-supervisor -c test.conf &

Coupling that output td-agent process with an input of ...

$ td-agent-bit -i tcp://127.0.0.1:10130 -t test -o forward://127.0.0.1:10131 &
$ seq 1 3000 | sed 's/.*/{"num": &, "filler": "this is filler text to make the event larger"}/' > /dev/tcp/localhost/10130 &

... sees the td-agent resident-set memory jump to around ~650MB and hold there, even long after all data has been processed. (Note: I'm using Fluent Bit 0.11.15 here, which has a nasty data duplication problem which is useful in stressing td-agent with input in this test.)

Honestly, it seems like the real issue here is the Ruby 2.1 garbage collector; as far as I can tell, it never releases memory to the OS that it allocates, thereby damning any process that has an even momentary need for a large quantity of memory to hold on to that memory for the rest of its lifetime. (Please correct me if I'm wrong here.)

If the Ruby gc issue cannot be fixed by some form of additional configuration, then perhaps Fluentd could use some type of backpressure mechanism to avoid ingesting input faster than it can truly process it and avoid accumulating large queues in memory?

kladiv commented 5 years ago

+1

ganmacs commented 4 years ago

it's a stale issue. I'm closing. if you have any problem, updating fluentd and ruby might help it.

joell commented 4 years ago

I have re-tested this issue using the latest td-agent 3.5.1 RPM package for RHEL 7. That package includes Fluentd 1.7.4 and embeds Ruby 2.4.0. The problem still remains exactly as it was originally reported.

Nothing has been fixed, despite the intervening 2 years. The issue should be re-opened.

P.S. You as the Fluentd developers and project administrators collectively have the right to run your project however you see fit. However, making claims of Fluentd's suitability for production use and its performance on your website are not consistent with this kind of callous disregard for the validity of an easily reproducible issue that has serious impacts on Fluentd's production suitability. I gave very simple steps to reproduce this issue, and it only took me a few minutes to download the latest TD-Agent RPM, install it, and copy and paste the commands from my original report to see that the outcome remained the same. You could have trivially done the same. The fact that you could not be bothered to do so but instead chose to try to bury and ignore this problem speaks volumes, especially as another user indicated as recently as June 27 that it is still affecting people. If you truly feel that is the appropriate response, then you should also remove the false claims of performance and production-suitability from the Fluentd website.

ganmacs commented 4 years ago

The problem still remains exactly as it was originally reported.

Thank you re-test it. then I should re-open this issue.

cede87 commented 4 years ago

I'm having the same issue using td-agent 3.8.0 RPM package for Amazon2. That package includes Fluentd 1.11.1 and embeds Ruby 2.4.10. Any news here? Work in progress? Our fluentD is in production with 4 aggregators ingesting at the same time into ElasticSearch. So far stable but the memory is normalizing always close to 100% leaving just 250-300 MB free which I think doesn't have any sense... I don't know what else to test... I'm checking this since weeks, changing many configurations and different versions without any clue. Even I tried to adjust the GC variables as @joell did in the past but the behaviour is the same.

You can see here how a new aggregator doesn't release memory until be close to 100%. The unique thing we improve adding a new aggregator is reduce a little bit the CPU usage.

Our buffer size (I think is in sync with this memory issue) as I read in other threads with memory buffer the behaviour is different...

Total network I/O (when the peacks are related also with the amount of memory needed)

Please help, thanks in advance.

joell commented 4 years ago

@cede87: As I noted in my comment on April 9, 2017 the underlying issue here appears to be in the default Ruby garbage collector and memory allocator.

The Fluentd developers could directly avoid this by applying a backpressure mechanism or spooling incoming data to disk instead of in memory.

Alternatively, the Ruby memory allocation issue can be indirectly avoided by replacing or manipulating the Ruby memory allocator. One method is to replace the allocator with jemalloc (though different versions are reported to be more effective than others); this approach was documented as being done by the Fluentd devs, but as I noted in the original issue text it doesn't look like jemalloc is actually used in the build that produces the RPM. Another method would be to try to manipulate the allocator's behavior through things like the MALLOC_ARENA_MAX environment variable.

A summary of the underlying problem and some of the techniques you might be able to try -- including going so far as patching the Ruby garbage collector yourself -- can be found in this article.

Best of luck.

cede87 commented 4 years ago

@joell thanks for the quick reply. I did the same checks you did in the past and I could verify that we are using jemalloc with our RPM installation. So if I'm not mistaken we are not able to use MALLOC_ARENA_MAX environment variable... even so we are suffering the same problems. Any suggestion?

[root@x ~]# pmap 4057 | grep jemalloc 00007fba1a234000 292K r-x-- libjemalloc.so.2 00007fba1a27d000 2048K ----- libjemalloc.so.2 00007fba1a47d000 8K r---- libjemalloc.so.2 00007fba1a47f000 4K rw--- libjemalloc.so.2

Thanks!

joell commented 4 years ago

@cede87: The presence of libjemalloc.so.2 indicates you might be running jemalloc 5 instead of jemalloc 3. During a discussion about making jemalloc the default allocator for Ruby, it was noted:

jemalloc 3.6 is slow but space efficient. jemalloc 5.1 is faster but almost as bad with space as untuned glibc.

Glancing at package content listings online, it looks like you would see libjemalloc.so.1 if you were using the jemalloc 3 series, which demonstrates the small heap sizes. You might consider rebuilding Fluentd with jemalloc 3 instead.

cede87 commented 4 years ago

@joell many many thanks for your suggestions. I was able to change jemalloc version from 5.x to 3.6.0 using td-agent 3.8.0 in one server (just to test). Notice the difference.

So I can confirm the following:

td-agent 3.8.0 is using jemalloc 5.x
jemalloc 5.x is faster but almost as bad with space as untuned glibc.
downgrade jemalloc to 3.6.0 solved the memory issues.

I think FluentD developers should take a look on this.
Thanks again,

Adhira-Deogade commented 4 years ago

Would appreciate a solution, the memory usage has peaked and isn't falling down at all.

cede87 commented 4 years ago

Hi @Adhira-Deogade please follow my notes to be able to downgrade jemalloc version.

yum groupinstall "Development Tools"
wget https://github.com/jemalloc/jemalloc/releases/download/3.6.0/jemalloc-3.6.0.tar.bz2
bunzip2 jemalloc-3.6.0.tar.bz2
tar xvf jemalloc-3.6.0.tar
cd jemalloc-3.6.0
./configure
make
make install
create symbolic links: ln -s file link

Note: First delete the files in: cd /opt/td-agent/embedded/lib rm libjemalloc.a libjemalloc_pic.a libjemalloc.so.2 libjemalloc.so ln -s /usr/local/lib/libjemalloc.a /opt/td-agent/embedded/lib/libjemalloc.a ln -s /usr/local/lib/libjemalloc_pic.a /opt/td-agent/embedded/lib/libjemalloc.pic.a ln -s /usr/local/lib/libjemalloc.so.1 /opt/td-agent/embedded/lib/libjemalloc.so.2 ln -s libjemalloc.so.2 libjemalloc.so

Note: If you do ls you should see these symbolic links: /opt/td-agent/embedded/lib/libjemalloc.a -> /usr/local/lib/libjemalloc.a /opt/td-agent/embedded/lib/libjemalloc.pic.a -> /usr/local/lib/libjemalloc_pic.a /opt/td-agent/embedded/lib/libjemalloc.so -> libjemalloc.so.2 /opt/td-agent/embedded/lib/libjemalloc.so.2 -> /usr/local/lib/libjemalloc.so.1

Restart td-agent
Verify ps -aux | grep td-agent -> Copy the ID of one process pmap ID | grep jemalloc -> Must be libjemalloc.so.1 [root@ip-10-103-149-184 lib]# pmap 28103 | grep jemalloc 00007ff35f4f9000 276K r-x-- libjemalloc.scd o.1 00007ff35f53e000 2048K ----- libjemalloc.so.1 00007ff35f73e000 8K r---- libjemalloc.so.1 00007ff35f740000 4K rw--- libjemalloc.so.1

I hope it helps you, Daniel

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

joell commented 3 years ago

This issue should remain open until it is resolved. It has continued to affect people since it was reported in 2017, and the current best workarounds are laborious and invasive.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

joell commented 3 years ago

I repeat:

This issue should remain open until it is resolved. It has continued to affect people since it was reported in 2017, and the current best workarounds are laborious and invasive.

cede87 commented 3 years ago

This issue is still affecting us. The unique solution is changing the jemalloc version. Currently is what we are doing even in production environments. Which is not the best solution.

ashie commented 3 years ago

Thank you for notifying it.

# no jemalloc shared library mentioned in the memory mapping
$ pmap 4343 | grep jemalloc
# ... so it doesn't look like it's dynamically linked

td-agent loads jemalloc by LD_PRELOAD and this environment variable is set only by init script or systemd unit file. Launching td-agent command manually doesn't set it, this is the why you didn't see jemalloc in pmap.

https://github.com/treasure-data/omnibus-td-agent/blob/b0a421e49b123c0c77ae897fbd1f782bb226a9ed/templates/etc/init.d/rpm/td-agent#L186-L188

if [ -f "${TD_AGENT_HOME}/embedded/lib/libjemalloc.so" ]; then
  export LD_PRELOAD="${TD_AGENT_HOME}/embedded/lib/libjemalloc.so"
fi

https://github.com/treasure-data/omnibus-td-agent/blob/master/templates/etc/systemd/td-agent.service.erb#L11

Environment=LD_PRELOAD=<%= install_path %>/embedded/lib/libjemalloc.so

td-agent 3.8.0 is using jemalloc 5.x

td-agent 3 uses jemalloc 4.5.0, not 5.x

https://github.com/treasure-data/omnibus-td-agent/blob/b0a421e49b123c0c77ae897fbd1f782bb226a9ed/config/projects/td-agent3.rb#L24

override :jemalloc, :version => '4.5.0'

ashie commented 3 years ago

Hmm, I confirmed that jemalloc-3.6.0 consumes fewer memory than jemalloc 5.2.1 (td-agent 4.1.0's default) in this case:

jemalloc 3.6.0:

  27672 450484 231048 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-in.conf --under-supervisor
  27685 364460 139124 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-out.conf --under-supervisor

jemalloc 5.2.1:

  28148 554264 255248 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-in.conf --under-supervisor
  28161 472848 187796 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-out.conf --under-supervisor

You can switch jemalloc version easily by LD_PRELOAD:

$ wget https://github.com/jemalloc/jemalloc/releases/download/3.6.0/jemalloc-3.6.0.tar.bz2
$ tar xvf jemalloc-3.6.0.tar.bz2
$ cd jemalloc-3.6.0
$ ./configure --prefix=/opt/jemalloc-3.6.0
$ make
$ sudo make install
$ LD_PRELOAD=/opt/jemalloc-3.6.0/lib/libjemalloc.so td-agent

But I'm not sure it's always efficient and worth to replace.

joell commented 3 years ago

@ashie: Thank you for looking into this issue and confirming yourself what the community has been reporting.

Regarding the efficiency of jemalloc 3.x vs 5.x, the common trend I've read is that 5.x may be a bit faster. However, as Fleuntd is both advertised for production use and frequently used in production environments, I would argue that stability is more important than speed here. We have run into issues using Fluentd in production where its memory consumption has grown to the point where it has actively hampered other more important production services on a host. Ultimately, we've had to move away from Fluentd for certain applications because of this bug.

For the sake of ensuring system stability, I would argue for making jemalloc 3.x the default allocator. If people need greater performance and are confident their use case will not trigger this memory consumption issue, they could use jemalloc 5.x instead via LD_PRELOAD.

I urge that the default Fluentd configuration prioritize stability over performance.

ashie commented 3 years ago

Thanks for your opinion. I've opened an issue for td-agent: https://github.com/fluent-plugins-nursery/td-agent-builder/issues/305

weakcamel commented 3 years ago

I can confirm that the issue is still very much present and valid on 4.2.0 td-agent / Ubuntu Bionic. We're using it to report on Artifactory stats as part of Jfrog monitoring platform so configuration is pretty much default.

Fluentd's memory usage is creeping up a lot so as a workaround we've applied cgroup limits to it (50% of 128GB RAM utilization of the host which is a massive 64 GB). It took only a few hours after restarting td-agent service for it to be OOM killed:

Nov 02 18:55:57 host.example.com kernel: ruby invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Nov 02 18:55:57 host.example.com kernel: CPU: 38 PID: 16039 Comm: ruby Tainted: P           O      5.4.0-77-generic #86~18.04.1-Ubuntu
Nov 02 18:55:57 host.example.com kernel: Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.4.2 01/09/2017
Nov 02 18:55:57 host.example.com kernel: Call Trace:
Nov 02 18:55:57 host.example.com kernel:  dump_stack+0x6d/0x8b
Nov 02 18:55:57 host.example.com kernel:  dump_header+0x4f/0x200
Nov 02 18:55:57 host.example.com kernel:  oom_kill_process+0xe6/0x120
Nov 02 18:55:57 host.example.com kernel:  out_of_memory+0x109/0x510
Nov 02 18:55:57 host.example.com kernel:  mem_cgroup_out_of_memory+0xbb/0xd0
Nov 02 18:55:57 host.example.com kernel:  try_charge+0x79a/0x7d0
Nov 02 18:55:57 host.example.com kernel:  ? __alloc_pages_nodemask+0x153/0x320
Nov 02 18:55:57 host.example.com kernel:  mem_cgroup_try_charge+0x75/0x190
Nov 02 18:55:57 host.example.com kernel:  mem_cgroup_try_charge_delay+0x22/0x50
Nov 02 18:55:57 host.example.com kernel:  __handle_mm_fault+0x8d5/0x1270
Nov 02 18:55:57 host.example.com kernel:  ? __switch_to_asm+0x40/0x70
Nov 02 18:55:57 host.example.com kernel:  handle_mm_fault+0xcb/0x210
Nov 02 18:55:57 host.example.com kernel:  __do_page_fault+0x2a1/0x4d0
Nov 02 18:55:57 host.example.com kernel:  do_page_fault+0x2c/0xe0
Nov 02 18:55:57 host.example.com kernel:  page_fault+0x34/0x40
Nov 02 18:55:57 host.example.com kernel: RIP: 0033:0x7f1f16426b23
Nov 02 18:55:57 host.example.com kernel: Code: fe 6f 64 16 e0 c5 fe 6f 6c 16 c0 c5 fe 6f 74 16 a0 c5 fe 6f 7c 16 80 c5 fe 7f 07 c5 fe 7f 4f 20 c5 fe 7f 57 40 c5 fe 7f 5f 60 <c5> fe 7f 64 17 e0 c5
Nov 02 18:55:57 host.example.com kernel: RSP: 002b:00007f0bafa270c8 EFLAGS: 00010202
Nov 02 18:55:57 host.example.com kernel: RAX: 00007f098866df61 RBX: 00007f1ed0df8f90 RCX: 000000000426df61
Nov 02 18:55:57 host.example.com kernel: RDX: 00000000000000fc RSI: 00007f1ee3a3e800 RDI: 00007f098866df61
Nov 02 18:55:57 host.example.com kernel: RBP: 00000000071c2454 R08: 00007f0984400000 R09: ffffffffffffffff
Nov 02 18:55:57 host.example.com kernel: R10: 00007f1f14c5e100 R11: 00007f09885b4192 R12: 00007f1ee3a3e800
Nov 02 18:55:57 host.example.com kernel: R13: 00000000000000fc R14: 0000000000000001 R15: 000000000426e05d
Nov 02 18:55:57 host.example.com kernel: memory: usage 65894512kB, limit 65894512kB, failcnt 29724
Nov 02 18:55:57 host.example.com kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Nov 02 18:55:57 host.example.com kernel: kmem: usage 138544kB, limit 9007199254740988kB, failcnt 0
Nov 02 18:55:57 host.example.com kernel: Memory cgroup stats for /system.slice/td-agent.service:
Nov 02 18:55:57 host.example.com kernel: anon 67332968448
                                                  file 0
                                                  kernel_stack 626688
                                                  slab 3518464
                                                  sock 0
                                                  shmem 0
                                                  file_mapped 0
                                                  file_dirty 0
                                                  file_writeback 811008
                                                  anon_thp 0
                                                  inactive_anon 3250925568
                                                  active_anon 64081309696
                                                  inactive_file 0
                                                  active_file 0
                                                  unevictable 0
                                                  slab_reclaimable 557056
                                                  slab_unreclaimable 2961408
                                                  pgfault 378416214
                                                  pgmajfault 0
                                                  workingset_refault 0
                                                  workingset_activate 0
                                                  workingset_nodereclaim 0
                                                  pgrefill 946474
                                                  pgscan 243287
                                                  pgsteal 91570
                                                  pgactivate 60357
                                                  pgdeactivate 946408
                                                  pglazyfree 0
                                                  pglazyfreed 0
                                                  thp_fault_alloc 0
                                                  thp_collapse_alloc 0
Nov 02 18:55:57 host.example.com kernel: Tasks state (memory values in pages):
Nov 02 18:55:57 host.example.com kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 02 18:55:57 host.example.com kernel: [  15976]     0 15976    59526     3287   462848     6900             0 fluentd
Nov 02 18:55:57 host.example.com kernel: [  15983]     0 15983 22606458 16437845 134709248    84397             0 ruby
Nov 02 18:55:57 host.example.com kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0-1,oom_memcg=/system.slice/td-agent.service,task_memcg=/system.slice/td-agent.
Nov 02 18:55:57 host.example.com kernel: Memory cgroup out of memory: Killed process 15983 (ruby) total-vm:90425832kB, anon-rss:65744072kB, file-rss:7308kB, shmem-rss:0kB, UID:0 pgtables:131552kB
Nov 02 18:55:59 host.example.com kernel: oom_reaper: reaped process 15983 (ruby), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

swordfish444 commented 2 years ago

@ashie What is the latest status on this issue? This seems like a HUGE flaw in fluentd. The issue you linked https://github.com/fluent/fluent-package-builder/issues/305 was closed without action taken.

We are facing fluentd OOM issues in production. Please advise.

torbenaa commented 1 year ago

Same issue with td-agent 4.4.2 ( fluentd 1.15.3 ) on ubi image ( rhel8 ).

I cannot believe this is not fixed - surely not production ready :(

yangjiel commented 1 year ago

I can confirm this behavior on td-agent 4.4.1 fluentd 1.13.3 (https://github.com/fluent/fluentd/commit/c32842297ed2c306f1b841a8f6e55bdd0f1cb27f) as well.

The memory seems is not dynamically deallocated. Initial startup without traffic, memory consumption is low. After a large traffic then turn to no traffic, memory is still staying at the highest point.

yangjiel commented 1 year ago

This issue may relate to https://github.com/fluent/fluentd/issues/4174

daipom commented 1 year ago

Regarding the case where ignore_same_log_interval is the cause, it will be fixed by the following issue and PR. Thanks @yangjiel !

4174
4229

fluent / fluentd

Fluentd retains excessive amounts of memory after handling traffic peaks #1657

4174

4229