Persistent File Staleness Errors in GlusterFS Volume

meetsoni15 commented 1 month ago

Description of problem:

We have configured a GlusterFS cluster with three peers and utilized six SSD drives for improved performance. The GlusterFS volume is mounted on a client using Server 5, where MinIO is deployed to provide object storage capabilities.

Configuration Details:

Servers:
    Server 5: 172.16.16.10 (acts as both server and client)
    Server 6: 172.16.16.7
    Server 7: 172.16.16.6
Drives:
    Each server hosts 2 SSD drives, 1TB each, formatted with Ext4.

Issue Encountered:

After performing manual file cleanup within a specific folder on the mounted GlusterFS volume, file staleness errors started appearing. To address this, we created a new folder and refrained from further cleanup activities. However, the same file staleness error resurfaced within the new folder after 2 days.

Expected results: Resolve file staleness issue in GlusterFS volume for stable data access and improved system reliability.

Mandatory info: - The output of the gluster volume info command:

Volume Name: minio-vol
Type: Distribute
Volume ID: a3bea87d-748e-4a15-80af-3343aa7608b3
Status: Started
Snapshot Count: 0
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: 172.16.16.6:/opt/disk1/minio
Brick2: 172.16.16.7:/opt/disk1/minio
Brick3: 172.16.16.6:/opt/disk2/minio
Brick4: 172.16.16.7:/opt/disk2/minio
Brick5: 172.16.16.10:/opt/disk2/minio
Brick6: 172.16.16.10:/opt/disk1/minio
Options Reconfigured:
performance.client-io-threads: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
cluster.eager-lock: off

- The output of the gluster volume status command:

Status of volume: minio-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.16.6:/opt/disk1/minio          54797     0          Y       665614
Brick 172.16.16.7:/opt/disk1/minio          58252     0          Y       2540584
Brick 172.16.16.6:/opt/disk2/minio          50657     0          Y       665623
Brick 172.16.16.7:/opt/disk2/minio          60607     0          Y       2540600
Brick 172.16.16.10:/opt/disk2/minio         60216     0          Y       2036048
Brick 172.16.16.10:/opt/disk1/minio         49344     0          Y       2036055

Task Status of Volume minio-vol
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Launching heal operation to perform index self heal on volume minio-vol has been unsuccessful:
Self-heal-daemon is disabled. Heal will not be triggered on volume minio-vol

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

We are using server and client on same server i.e server5

**- Is there any crash ? Provide the backtrace and coredump No

Additional info:

- The operating system / glusterfs version:

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

aravindavk commented 1 month ago

Remount the mount and check if the issue persists. How was the file cleaned up? Was it deleted from the mount or from the backend brick? Please share the steps to cleanup the file.

meetsoni15 commented 1 month ago

@aravindavk

Have remounted it multiple times. It still persists.

Files were cleaned up from the Mounted folder, not from the brick.

Steps Followed for File Cleanup:

We created a Golang utility function that finds files older than a certain date and deletes them.
After deleting files, we encountered an issue regarding file staleness and found that we had to delete the associated folders as well.
We deleted folders of the deleted files.
We still encountered NFS file stale error.
We created a new folder in the same GlusterFS mounted directory, and after a few days, we encountered the same issue; we didn't delete any files at all.

gluster / glusterfs