glusterd memory leak observed while constantly running gluster volume status data detail

Description of problem:

glusterd process memory leaks constantly while running gluster volume status data detail

The exact command to reproduce the issue:

We have 3 node cluster with one volume. Our monitoring system runs different commands against the cluster and this leads to memory leak in glusterd process. We have isolated the problem down to the following command: while true; do gluster volume status data detail; sleep 1; done

Actual results:

glusterd RSS increase until OOM

Expected results:

glusterd process to release the unused memory

Mandatory info: - The output of the gluster volume info command:

[root@aux5 gluster]# gluster volume info

Volume Name: data
Type: Replicate
Volume ID: f058e21f-cd7c-438d-9e9d-1ab3370b4839
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: qdevice.bazadev.private:/var/preserve/.brick/data
Brick2: aux5.bazadev.private:/var/preserve/.brick/data
Brick3: aux6.bazadev.private:/var/preserve/.brick/data
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: on
cluster.quorum-count: 1
cluster.quorum-type: fixed
performance.client-io-threads: off
transport.address-family: inet
features.bitrot: on
features.scrub: Active
features.scrub-freq: monthly
features.scrub-throttle: normal
cluster.self-heal-daemon: on
performance.open-behind: off

- The output of the gluster volume status command:

[root@aux5 gluster]# gluster volume status
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick qdevice.bazadev.private:/var/preserve
/.brick/data                                49152     0          Y       19385
Brick aux5.bazadev.private:/var/preserve/.b
rick/data                                   49152     0          Y       3485
Brick aux6.bazadev.private:/var/preserve/.b
rick/data                                   49152     0          Y       583
Self-heal Daemon on localhost               N/A       N/A        Y       3875
Bitrot Daemon on localhost                  N/A       N/A        Y       3644
Scrubber Daemon on localhost                N/A       N/A        Y       3862
Self-heal Daemon on aux6.bazadev.private    N/A       N/A        Y       650
Bitrot Daemon on aux6.bazadev.private       N/A       N/A        Y       606
Scrubber Daemon on aux6.bazadev.private     N/A       N/A        Y       636
Self-heal Daemon on qdevice.bazadev.private N/A       N/A        Y       19458
Bitrot Daemon on qdevice.bazadev.private    N/A       N/A        Y       19425
Scrubber Daemon on qdevice.bazadev.private  N/A       N/A        Y       19446

Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

[root@aux5 gluster]# gluster volume heal data
Launching heal operation to perform index self heal on volume data has been successful
Use heal info commands to check status.

Additional info: I have captured statedump before running this loop as baseline and after 3 days of running the loop. https://gist.github.com/sorcky/7fa3980d8cc3f7a9f81e0a61b3e6dcf2

- The operating system / glusterfs version:

[root@aux5 tech]# cat /etc/oracle-release
Oracle Linux Server release 7.6
[root@aux5 tech]# uname -r
4.14.35-1844.2.5.el7uek.x86_64
[root@aux5 tech]# gluster --version
glusterfs 9.6
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

gluster / glusterfs

glusterd memory leak observed while constantly running gluster volume status data detail #3913