glusterfs not rebalancing properly

zieba88 commented 2 years ago

Hi I am seeing an issue with glusterfs where I started with replica 3 arb 1 setup and added 4 additional bricks. I see the data being spread across all the bricks but the larger files seem to only be spread across 4 of the 6 bricks. I have tried rebalancing and adding and removing bricks to no avail. The first pair bricks are 800gb and the other 2 pairs are 400gb. My Understanding is that the files should be balance proportionately with the larger 2 nodes receiving twice as much of the data. Here is the output of the space utilization after rebalancing. Any help with this is greatly appreciated.

server904: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 800G 333G 467G 42% /data/data server905: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 800G 333G 467G 42% /data/data

server766: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 400G 157G 244G 40% /data/data server767: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 400G 157G 244G 40% /data/data

server768: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 400G 2.9G 397G 1% /data/data server769: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 400G 2.9G 397G 1% /data/data

server906: (arbiter) Filesystem Size Used Avail Use% Mounted on /dev/sdb1 100G 865M 100G 1% /data/data

Mandatory info: [root@server766 ~]# gluster vol info

Volume Name: test_vol Type: Distributed-Replicate Volume ID: d4a5cfdf-0f0d-4562-84d2-1ef4c5e06c33 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: server904:/data/data/brick1 Brick2: server905:/data/data/brick1 Brick3: server906:/data/data/arb1 (arbiter) Brick4: server766:/data/data/brick1 Brick5: server767:/data/data/brick1 Brick6: server906:/data/data/arb2 (arbiter) Brick7: server768:/data/data/brick1 Brick8: server769:/data/data/brick1 Brick9: server906:/data/data/arb3 (arbiter) Options Reconfigured: cluster.self-heal-daemon: on cluster.entry-self-heal: on cluster.metadata-self-heal: on cluster.data-self-heal: on cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off [root@server766 ~]# gluster vol status Status of volume: test_vol Gluster process TCP Port RDMA Port Online Pid

Brick server904:/data/data/brick1 56683 0 Y 1832 Brick server905:/data/data/brick1 51234 0 Y 1888 Brick server906:/data/data/arb1 50234 0 Y 25053 Brick server766:/data/data/brick1 51075 0 Y 438636 Brick server767:/data/data/brick1 52384 0 Y 130570 Brick server906:/data/data/arb2 56281 0 Y 25068 Brick server768:/data/data/brick1 51973 0 Y 20166 Brick server769:/data/data/brick1 53790 0 Y 25489 Brick server906:/data/data/arb3 56614 0 Y 25083 Self-heal Daemon on localhost N/A N/A Y 438652 Self-heal Daemon on server905 N/A N/A Y 2046 Self-heal Daemon on server767 N/A N/A Y 130589 Self-heal Daemon on server906 N/A N/A Y 25099 Self-heal Daemon on server768 N/A N/A Y 20182 Self-heal Daemon on server904 N/A N/A Y 2017 Self-heal Daemon on server769 N/A N/A Y 25505

Task Status of Volume test_vol

Task : Rebalance ID : 0bee8e2f-0b82-4f78-be54-92727822a9ca Status : completed

[root@server766 ~]# gluster volume heal test_vol info Brick server904:/data/data/brick1 Status: Connected Number of entries: 0

Brick server905:/data/data/brick1 Status: Connected Number of entries: 0

Brick server906:/data/data/arb1 Status: Connected Number of entries: 0

Brick server766:/data/data/brick1 Status: Connected Number of entries: 0

Brick server767:/data/data/brick1 Status: Connected Number of entries: 0

Brick server906:/data/data/arb2 Status: Connected Number of entries: 0

Brick server768:/data/data/brick1 Status: Connected Number of entries: 0

Brick server769:/data/data/brick1 Status: Connected Number of entries: 0

Brick server906:/data/data/arb3 Status: Connected Number of entries: 0

[root@server766 ~]#

[root@server766 ~]# rpm -qa | grep gluster glusterfs-fuse-10.2-1.el8s.x86_64 glusterfs-10.2-1.el8s.x86_64 glusterfs-server-10.2-1.el8s.x86_64 libglusterd0-10.2-1.el8s.x86_64 libglusterfs0-10.2-1.el8s.x86_64 glusterfs-client-xlators-10.2-1.el8s.x86_64 glusterfs-selinux-2.0.1-1.el8s.noarch glusterfs-cli-10.2-1.el8s.x86_64

[root@server766 ~]# cat /etc/redhat-release CentOS Linux release 8.5.2111

zieba88 commented 2 years ago

anyone?

olegkrutov commented 2 years ago

Did you try fix-layout? https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#expanding-volumes

zieba88 commented 1 year ago

I've tried both fix layout and rebalance to no avail.

olegkrutov commented 1 year ago

AFAIK fix-layout must be issued right after new bricks addition since only files written onto volume AFTER fix-layout will be distributed according to new volume topology.

zieba88 commented 1 year ago

@olegkrutov After adding the 4 additional bricks, fix-layout was run and rebalance as well. In an effort to better understand this I set up a lab with a smaller subset of drives and data (same data just smaller sample size) and was able to preproduce this issue I am seeing. My understanding is that the data should rebalance across all the nodes from the documentation and yet we are observing the 2 larger nodes and 2 out of the 4 of the smaller nodes store most of the data. My question to the gluster community is whether I am wrong in my assessment that the data should be evenly distributed across all the nodes in the cluster after running fixlayout/rebalance or is this some sort of bug that I am running into?

zieba88 commented 1 year ago

To add to this, it seems that if filenames of files gluster is hosting are the same name but in different directories this causes the issues observed. After a rename of the files to append date from directory to filename and running a rebalance the cluster seems to have balanced itself out proportionately. Question still remains on whether this is a bug in gluster or expected behavior. I read that the method gluster uses to determine the hash is a combination of file path + file name but it certainly does not seem to be doing that as described here: https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/dht/

To minimize collisions either between files in the same directory with different names or between files in different directories with the same name, this hash is generated using both the (containing) directory's unique GFID and the file's name

stale[bot] commented 1 year ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

gluster / glusterfs

glusterfs not rebalancing properly #3897

Task Status of Volume test_vol