glustershd memory keeps increasing while creating PVCs

PrasadDesala commented 5 years ago

glusterfs memory increased from 74MB to 6.8G while creating 200 PVCs.

Before: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1150 root 20 0 3637200 74560 3320 S 0.0 0.2 0:01.52 glusterfs

After 200 PVCs are created: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1150 root 20 0 101.0g 6.8g 3388 S 94.1 21.6 17:43.07 glusterfs

Below are few other observations: 1) For few of the volumes brick port is showing as -1 Volume : pvc-9480160e-1279-11e9-a7a2-5254001ae311 +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+ | BRICK ID | HOST | PATH | ONLINE | PORT | PID | +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+ | b7a95b9b-17da-4220-a38d-2d23eb75c83a | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick1/brick | true | 40635 | 3612 | | 133011b8-1825-4b6e-87e1-d7bed7332f55 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick2/brick | true | -1 | 3041 | | ebfb7837-8657-46c9-aad9-449b6a1ba6bf | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick3/brick | true | 45864 | 3146 | +--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+ 2) I am seeing below continuous messages in glustershd logs, [2019-01-07 13:14:14.157784] W [MSGID: 101012] [common-utils.c:3186:gf_get_reserved_ports] 36-glusterfs: could not open the file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports info [No such file or directory] [2019-01-07 13:14:14.157840] W [MSGID: 101081] [common-utils.c:3226:gf_process_reserved_ports] 36-glusterfs: Not able to get reserved ports, hence there is a possibility that glusterfs may consume reserved port [2019-01-07 13:14:14.160159] W [MSGID: 101012] [common-utils.c:3186:gf_get_reserved_ports] 36-glusterfs: could not open the file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports info [No such file or directory] [2019-01-07 13:14:14.160213] W [MSGID: 101081] [common-utils.c:3226:gf_process_reserved_ports] 36-glusterfs: Not able to get reserved ports, hence there is a possibility that glusterfs may consume reserved port [2019-01-07 13:14:14.183845] I [socket.c:811:__socket_shutdown] 36-pvc-93515db8-1279-11e9-a7a2-5254001ae311-replicate-0-client-1: intentional socket shutdown(7073) [2019-01-07 13:14:14.183946] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 36-epoll: Failed to dispatch handler 3) Below logs are continuously logged in glusterd2 logs, time="2019-01-07 13:15:28.484617" level=info msg="client connected" address="10.233.64.8:47178" server=sunrpc source="[server.go:148:sunrpc.(SunRPC).acceptLoop]" transport=tcp time="2019-01-07 13:15:28.485340" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick2/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-9480160e-1279-11e9-a7a2-5254001ae311/subvol1/brick2/brick not found" source="[rpc_prog.go:104:pmap.(GfPortmap).PortByBrick]"

Observed behavior

glusterfs memory increased from 74MB to 6.8G after 200 PVCs are created. Also seeing above continuous messages getting logged.

Expected/desired behavior

glusterfs should not consume that much memory.

Details on how to reproduce (minimal and precise)

1) Create a 3 node GCS setup using valgrind. 2) Create 200 PVCs and keep on monitoring glusterfs resource consumption.

Information about the environment:

Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.99.git0839909
Operating system used: CentOS 7.6
Glusterd2 compiled from sources, as a package (rpm/deb), or container:
Using External ETCD: (yes/no, if yes ETCD version): Yes, 3.3.8
If container, which container image:
Using kubernetes, openshift, or direct install:
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: kubernetes

PrasadDesala commented 5 years ago

Attaching glusterd2 dump, glusterd2 logs and glusterfs process state dump.

kube3-glusterd2.log.gz kube2-glusterd2.log.gz kube1-glusterd2.log.gz glusterdump.1150.dump.1546865584.gz statedump_kube-1.txt

atinmu commented 5 years ago

@PrasadDesala I am assuming you meant glustershd is consuming high memory? Also did you enable brick multiplexing in the setup?

PrasadDesala commented 5 years ago

@PrasadDesala I am assuming you meant glustershd is consuming high memory? Also did you enable brick multiplexing in the setup?

I think it is glustershd but I am not sure why glustershd is consuming memory as I am just creating PVCs so no healing should take place. I see the process name as glusterfs.

Brick-mux is not enabled on the setup.

aravindavk commented 5 years ago

I think it is glustershd but I am not sure why glustershd is consuming memory as I am just creating PVCs so no healing should take place. I see the process name as glusterfs.

Yes, this is self heal process. Can be confirmed by checking cat /proc/<pid>/cmdline

atinmu commented 5 years ago

@itisravi @karthik-us ^^ might be worth to check the same with GD1 based deployment. This isn't specific to GD2 problem as such.

amarts commented 5 years ago

I suspect this is due to https://review.gluster.org/#/c/glusterfs/+/21990/ also. Lets run a round of tests tomorrow as it is merged today.

atinmu commented 5 years ago

On the latest master with multiple iterations we don't see memory consumption of glustershd process anything near to what has been reported and based on that I'm closing this for now. If we happen to hit this again, please feel free to reopen.

PrasadDesala commented 5 years ago

This issue is still seen on the last nightly build. glustershd process memory increased from 8616 to 6.2g.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
before: 395 root 20 0 514608 8616 3188 S 0.0 0.0 0:00.05 glusterfs before_1: 395 root 20 0 95.3g 6.2g 3324 S 88.2 19.9 14:49.35 glusterfs

root@gluster-kube1-0 ~]# cat /proc/395/cmdline /usr/sbin/glusterfs-sgluster-kube1-0.glusterd2.gcs--volfile-server-port24007--volfile-idgluster/glustershd-p/var/run/glusterd2/glustershd.pid-l/var/log/glusterd2/glusterfs/glustershd.log-S/var/run/glusterd2/shd-492ab606e75778b6.socket--xlator-optionreplicate.node-uuid=9842221d-97d1-4041-9d4c-51f6fc6ef191[root@gluster-kube1-0 ~]# ps -ef | grep -i glustershd glusterd version: v6.0-dev.109.gitdfb2462

amarts commented 5 years ago

Can we disable shd for now in this setup, and re-enable when things settle down?

atinmu commented 5 years ago

@PrasadDesala At this moment with every new PVCs we don't restart glustershd (which is a bug in GD2) and hence the overall memory consumption by the process remains static irrespective of how many PVCs we create and this is what is reflecting in my test setup too. So I'd definitely like to take a look at the setup where you are able to reproduce this.

PrasadDesala commented 5 years ago

@atinmu This issue is closed and I don't have the perms to reopen it. If you have the access can you please reopen this issue.

gluster / glusterd2