Continuous memory increase with filesystem buffering starting v1.9.9

JeffLuoo commented 1 year ago

Bug Report

Describe the bug

The memory of Fluent Bit pods increase continuously on k8s environment after v1.9.9. I tested it with v1.9.8 and can't see the issue. But I can still reproduce it with v2.1.2.

To Reproduce

Deploy Fluent-bit with offline buffering enabled on k8s cluster.

Expected behavior

Stable memory on stable load.

Screenshots

Your Environment

Version used: v1.9.8, v1.9.9, v2.1.2
Configuration:
Environment name and version (e.g. Kubernetes? What version?): Kubernetes
Server type and version:
Operating System and version:
Filters and plugins:

Additional context

Input log load is stable.

stv2509 commented 1 year ago

I have the same problem. Updated from version 2.0.6 to 2.1.2 I downgraded to 2.0.6 and add request/limit

v2.1.2

v2.0.6

leonardo-albertovich commented 1 year ago

That's very interesting, I was under the impression that any versions after 1.9.8 had this issue but it seems that we need to verify it in 2.0. It seems like a really good lead.

Would you be able to share your configuration / some more context @stv2509?

stv2509 commented 1 year ago

my kubernetes deploy yaml fluent-bit-install.zip

JeffLuoo commented 1 year ago

@stv2509 For my case this only happened if I enabled storage.type filesystem. For your configmap, could you try to disable the multiline?

pzs-pzs commented 1 year ago

I also encountered the same problem！

JeffLuoo commented 1 year ago

Bump to avoid closing.

bboppana commented 1 year ago

We are also seeing the continuous memory increase for fluent-bit pods for all versions starting from v2.0.0

JeffLuoo commented 1 year ago

My guessing is that the memory increase is introduced in https://fluentbit.io/announcements/v1.9.9/ as this is the first version I observed the memory increase and the release note mentioned some changes related to chunk files.

JeffLuoo commented 1 year ago

I tried to add the valgrind to my fluent-bit pod but it looks like errors are non-related:


==1== HEAP SUMMARY:
==1==     in use at exit: 201,172 bytes in 4,355 blocks
==1==   total heap usage: 321,263 allocs, 316,908 frees, 1,102,260,365 bytes allocated
==1== 
==1== Thread 1:
==1== 56 bytes in 1 blocks are definitely lost in loss record 505 of 584
==1==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x7766A8: mk_mem_alloc_z (mk_memory.h:70)
==1==    by 0x77681F: thread_get_libco_params (mk_http_thread.c:60)
==1==    by 0x7769FB: thread_params_set (mk_http_thread.c:168)
==1==    by 0x776BDB: mk_http_thread_create (mk_http_thread.c:226)
==1==    by 0x772FAA: mk_http_init (mk_http.c:748)
==1==    by 0x771E81: mk_http_request_prepare (mk_http.c:232)
==1==    by 0x774C61: mk_http_sched_read (mk_http.c:1576)
==1==    by 0x770ACC: mk_sched_event_read (mk_scheduler.c:695)
==1==    by 0x7792C4: mk_server_worker_loop (mk_server.c:523)
==1==    by 0x770480: mk_sched_launch_worker_loop (mk_scheduler.c:417)
==1==    by 0x4859608: start_thread (pthread_create.c:477)
==1== 
==1== LEAK SUMMARY:
==1==    definitely lost: 56 bytes in 1 blocks
==1==    indirectly lost: 0 bytes in 0 blocks
==1==      possibly lost: 0 bytes in 0 blocks
==1==    still reachable: 201,116 bytes in 4,354 blocks
==1==         suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1== 
==1== For lists of detected and suppressed errors, rerun with: -s
==1== ERROR SUMMARY: 1493 errors from 178 contexts (suppressed: 0 from 0)```

dahu33 commented 1 year ago

We are also affected but not sure how to help here.

balpert89 commented 1 year ago

We can also see this behavior in version 2.1.2 and 2.1.6, though we do not use filesystem. We can verify that reverting to 2.0.9 resolves the memory leaks.

With 2.1.2: fluent-memory

With 2.0.9:

colmaengus commented 1 year ago

It appears that 2.1.8 might be better. I was using 2.1.6 and upgraded to 2.1.8 (and added a container limit but this hasn't been reached). In the image below you can see the memory usage in the older version of fluent-bit was already starting to climb but after upgrading to 2.1.8 it seems to be stable.

JeffLuoo commented 1 year ago

I ran the fluent-bit v2.1.10 in my cluster and can still see the issue.. @edsiper Hi Edurado I wonder if the change of bumping chunkio to v1.5.0 https://github.com/fluent/fluent-bit/commit/4d0eca52a41f2d50ff48eb57b13f29d76ab55447 to fix the memory issue mentioned in this ticket?

danwkennedy commented 1 year ago

Hitting this as well in v2.1.10. This is especially noticeable if you configure the FS buffer on several plugins (ie. one plugin will see a slower increase compared to multiple plugins). We're attempting to tune our logging pipeline to a very high load and this is fighting our efforts 😞

Settings:

[SERVICE]
        Daemon Off
        Flush 1
        Log_Level info
        Parsers_File parsers.conf
        Parsers_File custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port 2020
        Health_Check On
        storage.path /var/log/flb-storage/

    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Skip_Long_Lines On
        Storage.type filesystem
        Mem_Buf_Limit 5MB
        DB /var/log/flb-storage/tail.db
        DB.locking true
        Read_from_Head true
        Refresh_Interval 10
        Buffer_Chunk_Size 1M
        Buffer_Max_Size 1M

    [INPUT]
        Name systemd
        Tag host.*
        Systemd_Filter _SYSTEMD_UNIT=kubelet.service
        Read_From_Tail On

    [INPUT]
        Name http
        host 0.0.0.0
        port 8900
        Alias input_http

    [INPUT]
        Name forward
        Buffer_Max_Size 25M
        Alias input_syslog_forward

    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

    [FILTER]
        Name nest
        Match kube.*
        Operation lift
        Nested_under kubernetes
        Add_prefix kubernetes_
        Alias nest_kubernetes

    [FILTER]
        Name rewrite_tag
        Match kube.*
        Rule $kubernetes_namespace_name ^(event-exporter)$ KubeEvents false
        Alias set_tag_to_KubeEvents

    [FILTER]
        Name grep
        Match KubeEvents
        Regex Reason .*
        Alias remove_invalid_KubeEvents

    [FILTER]
        Name modify
        Match KubeEvents
        Remove stream
        Remove _p
        Remove time
        Remove_wildcard kubernetes_

    [FILTER]
        Name modify
        Match kube.*
        Condition Key_Does_Not_Exist logType
        Set logType ContainerLogs

    [FILTER]
        Name         nest
        Match        kube.*
        Alias        schematize_our_stuff
        Operation    lift
        Nested_under logData

    [FILTER]
        Name modify
        Match kube.*
        Remove kubernetes_pod_id
        Remove kubernetes_docker_id
        Remove kubernetes_container_image
        Remove kubernetes_annotations
        Remove kubernetes_labels
        Remove stream
        Remove _p
        Rename kubernetes_pod_name K8sPod
        Rename kubernetes_namespace_name K8sNamespace
        Rename kubernetes_host K8sHost
        Rename kubernetes_container_name K8sContainer
        Rename kubernetes_container_hash K8sContainerHash
        Alias tidy_schema

    [FILTER]
        Name    rewrite_tag
        Match   kube.*
        Alias   set_tag_to_logType_knownTables
        Rule    $logType ^(Custom Matching Here)$ $0 false
        Emitter_Name KnownTables
        Emitter_Storage.type filesystem
        Emitter_Mem_Buf_Limit 30MB

    [FILTER]
        Name    rewrite_tag
        Match   kube.*
        Alias   set_tag_to_logType_unknownTables
        Rule    $logType ^(.*)$ UnregisteredLogType false

    [OUTPUT]
        Name forward
        Match *
        Host <proxy pod on the same cluster>
        Port 5001
        Alias output_forward_mdsd
        Retry_Limit no_limits

bboppana commented 1 year ago

We are still seeing the continuous memory increase with v2.1.10 also. We use filesystem buffers. We are stuck at v1.9.8.

JeffLuoo commented 1 year ago

From my recent experiments, this only happens in my clusters if I enabled filesystem buffer.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

wiardvanrij commented 10 months ago

Experienced the same behaviour as well;

Downgrading all the way to 1.9.8 seems to solve it.

Again, this is only with storage.type filesystem

dshackith commented 10 months ago

We continue to test each new version that is released and still see this behavior. We also use storage.type filesystem. Still stuck on v1.9.8

vi4vikas commented 10 months ago

Any resolution for this problem in the latest version?

gyoetam commented 9 months ago

I was unable to reproduce the issue running fluent-bit natively (freshly built from tag v1.9.9), but the leak seems to be present when running the official docker image built from the exact same commit on my machine, running kernel 6.1.77. We are also observing the same behavior in a cluster with kernel 6.1.18.

Stripping away docker, kubernetes and containers and simply running fluent-bit in a non memory restricted cgroup makes the leak appear. Interestingly, if I limit the available memory for the cgroup, the leak seems to disappear.

I ran 3 fluent-bit processes locally on the same exact input file with the same exact configuration: 1) bin/fluent-bit -c $configfile 2) systemd-run --scope -u fluent-bit-test bin/fluent-bit -c $configfile 3) systemd-run --scope -u fluent-bit-test-memlimited -p MemoryMax=30M -p MemorySwapMax=0 bin/fluent-bit -c $configfile

After a while, a very striking difference in apparent memory usage could be seen: in contrast to the ~9MB resident set size (even virtual size is just ~100MB), the cgroup shows a ~30MB peak (when limited to 30MB) or even ~180MB (when not limited):

 ~> ps -o rss,pss,vsz $(pgrep fluent-bit)
  RSS   PSS    VSZ
 9076  2581 101840
 9152  1938 101772
 9168  2280 101840
~> grep . /sys/fs/cgroup/system.slice/fluent-bit-test*.scope/memory.{peak,current} | sort | column -t -s :
/sys/fs/cgroup/system.slice/fluent-bit-test-memlimited.scope/memory.current  6561792
/sys/fs/cgroup/system.slice/fluent-bit-test-memlimited.scope/memory.peak     31457280
/sys/fs/cgroup/system.slice/fluent-bit-test.scope/memory.current             184197120
/sys/fs/cgroup/system.slice/fluent-bit-test.scope/memory.peak                185344000

So looking at the process itself, there's nothing suggesting a memory leak; however, the cgroup's memory accounting shows an ever increasing value, if memory is not constrained for the cgroup.

Since most reports above are about pods in kubernetes clusters - which uses cgroups for running fluent-bit - it's possible that the issue is not entirely with fluent-bit, but with how cgroup handles the way fluent-bit manages memory.

@edsiper, @niedbalski does this help in pinpointing what the issue could be?

braydonk commented 8 months ago

Looked into this given the information in the previous comment. Thank you for the investigation.

With the memory.current stat in cgroups, it also tracks cached resources that are not necessarily paged into memory but the kernel keeps held until it has to release them, usually due to some manner of memory pressure. I'm guessing this is why with the memory limiting in the 3rd example, the number goes back down to normal amounts; the group is not intended to exceed a certain amount of memory, simulating memory pressure that has the kernel release those cached resources.

Got this info from https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html#core-interface-files and https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html#determining-working-set-size

I don't know much about the kubernetes metrics, so I'm not sure where the container_memory_working_set_bytes metric is derived from. I assume it's not derived from the memory.current stat, if it is then this could be a good hint. Otherwise I am not sure if this info can necessarily indicate the problem. :disappointed:

gyoetam commented 8 months ago

@braydonk thanks for the update. We still have some metrics from the cluster where we experienced the issue, and looking back, I don't see any high container_memory_cache usage in fluent-bit, it was only 8MB. Mapped file was not high either (average values for each metric below) .

after the downgrade to 1.9.8, we see this: mem_after

I believe k8s gets the metrics from cAdvisor, and looking here: https://github.com/google/cadvisor/blob/master/info/v1/container.go#L365 the comment states:

"The amount of working set memory, this includes recently accessed memory, dirty memory, and kernel memory. Working set is <= "usage".

Since I don't see separate metrics for the components, I can't tell which one could be the culprit. This might be where the actual numbers are calculated: https://github.com/google/cadvisor/blob/master/container/libcontainer/handler.go#L801 which seems to indicate working set is simply usage reduced by inactive_file. Though there are some differences when handling different cgroup versions, these don't seem to affect how usage and working set are calculated, and what we see is a high usage value for fluent-bit 1.9.8.

So the big question is: what changed in the memory usage pattern of >=1.9.9 that cannot be attributed to either one of the sub-metrics (cache, file, etc.), and simply shows up as "usage"?

fin-fabionoris commented 8 months ago

Hi, I confirm we have the same issue since we enabled filesystem buffering on our ECS clusters. 1.9.8 is the last version without leak, all other versions (3.0.0 included) are leaking memory:

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

dshackith commented 5 months ago

This is still an issue when using filesystem buffering and is preventing us from upgrading past v1.9.8

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

bboppana commented 2 months ago

Still has the issue when using filesystem buffers.

braydonk commented 2 months ago

Removing the Stale label because this problem does still seem to exist.

shadowshot-x commented 2 months ago

We are facing the same issue even when using File System Buffers

jtackaberry commented 2 months ago

I am not sure if there's anything serious here, and I think braydonk hit upon it in his March 19th comment.

To be sure, we see the same thing here. Here's container_memory_working_set_bytes from Fluent Bit pods running on one of our K8s clusters over a week period following a restart:

But this is just VFS cache. The cgroups limit on the Fluent Bit pods is set to 400MiB, and you can see this is where things level out and the kernel starts reclaiming memory by evicting data from the VFS cache.

To prove it, if I jump on a node and echo 3 > /proc/sys/vm/drop_caches the above metric immediately drops for the Fluent Bit pod running on that node.

In contrast, here is container_memory_rss from that same set of pods over the same time range:

If this actually did change between Fluent Bit 1.9.8 and 1.9.9, perhaps 1.9.8 was using O_DIRECT for the filesystem buffer files and 1.9.9 stopped doing that? O_DIRECT would not pollute the VFS cache, so that's at least a hypothesis.

It's perhaps undesirable to pollute the VFS cache for data unlikely to be read back except during failure scenarios (at least insofar as I understand the filesystem buffering logic), but I myself don't see this as a critical issue and it didn't gate our deployment to production.

abdullazin commented 1 week ago

Bump, I can confirm this is still happening in the most recent fluent-bit version 3.1.x when enabling storage.type filesystem

fluent / fluent-bit

Continuous memory increase with filesystem buffering starting v1.9.9 #7353

Bug Report