gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.68k stars 1.08k forks source link

Lock file on non-glusterfs-volume is taken for ever by glusterfs fuse after reboot #4008

Open kripper opened 1 year ago

kripper commented 1 year ago

Description of problem:

Lock file on non-glusterfs-volume is taken for ever by glusterfs fuse after forced reboot.

The exact command to reproduce the issue:

Used flock() from PHP on /tmp/test.lock which is not a glusterfs. After forced reboot (power failure) the lock file is taken for ever by this process:

/usr/sbin/glusterfs --process-name fuse --volfile-server=myvol --volfile-id=myvol /mnt/myvol

This problem has been there for over 8 years.

It's very easy to reproduce.

Expected results:

Lock should be released after reboot. And maybe not taken by glusterfs if it's not a glusterfs volume.

Mandatory info: - The output of the gluster volume info command:

Volume Name: backups Type: Distribute Volume ID: 782b8005-6db7-4b91-9854-a9a7ae326fef Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx:/home/datacenter/gluster-bricks/backups Options Reconfigured: performance.readdir-ahead: on
Volume Name: iso Type: Distribute Volume ID: c47ef09e-c383-4952-9b94-f243c40b019b Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx:/home/datacenter/gluster-bricks/iso Options Reconfigured: performance.readdir-ahead: on Volume Name: templates Type: Distribute Volume ID: 287346b6-679d-4ff5-a73f-1dabd6a9147e Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx:/home/datacenter/gluster-bricks/templates Options Reconfigured: performance.readdir-ahead: on

- The output of the gluster volume status command:

Status of volume: backups-h9 Gluster process TCP Port RDMA Port Online Pid

Brick xxx:/home/datacenter/gluster-bricks/backups 49152 0 Y 1557 NFS Server on localhost N/A N/A N N/A

Task Status of Volume backups

There are no active volume tasks

Status of volume: iso Gluster process TCP Port RDMA Port Online Pid

Brick xxx :/home/datacenter/gluster-bricks/iso 49153 0 Y 2391 NFS Server on localhost N/A N/A N N/A

Task Status of Volume iso

There are no active volume tasks

Status of volume: templates Gluster process TCP Port RDMA Port Online Pid

Brick xxx:/home/datacenter/gluster-bricks/templates 49154 0 Y 2764 NFS Server on localhost N/A N/A N N/A

Task Status of Volume templates

There are no active volume tasks

- The output of the gluster volume heal command:

Not issued.

Additional info:

- The operating system / glusterfs version:

centos-release-gluster6-1.0-1.el7.centos.noarch glusterfs-fuse-6.10-1.el7.x86_64 glusterfs-libs-6.10-1.el7.x86_64 glusterfs-client-xlators-6.10-1.el7.x86_64 glusterfs-api-6.10-1.el7.x86_64 glusterfs-server-6.10-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-36.el7_9.5.x86_64 glusterfs-6.10-1.el7.x86_64 glusterfs-cli-6.10-1.el7.x86_64

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

xhernandez commented 1 year ago

Gluster shouldn't take locks on arbitrary files, even less after a reboot. Could you provide the exact steps to reproduce it and the output of the command you use to "see" that Gluster has a lock on /tmp/test.lock ?

kripper commented 1 year ago

I use lsof after the reboot, which means that somehow the lock is retaken when glusterfs is started.

Very strange that the locked file isn't located on a Glusterfs volume but on/tmp. Maybe it's related with the fact that we originally mounted an OVirt Environment on this server for HA. I remember OVirt used an especial locking architecture.

Please take a look at this:

root@host ~]# lsof | grep /tmp
tuned      1253          root  DEL       REG              253,0                   684090 /tmp/ffiJ6aVZS
tuned      1253          root    6u      REG              253,0         4096      684090 /tmp/ffiJ6aVZS (deleted)
gmain      1253  1317    root  DEL       REG              253,0                   684090 /tmp/ffiJ6aVZS
gmain      1253  1317    root    6u      REG              253,0         4096      684090 /tmp/ffiJ6aVZS (deleted)
tuned      1253  1318    root  DEL       REG              253,0                   684090 /tmp/ffiJ6aVZS
tuned      1253  1318    root    6u      REG              253,0         4096      684090 /tmp/ffiJ6aVZS (deleted)
tuned      1253  1319    root  DEL       REG              253,0                   684090 /tmp/ffiJ6aVZS
tuned      1253  1319    root    6u      REG              253,0         4096      684090 /tmp/ffiJ6aVZS (deleted)
tuned      1253  1320    root  DEL       REG              253,0                   684090 /tmp/ffiJ6aVZS
tuned      1253  1320    root    6u      REG              253,0         4096      684090 /tmp/ffiJ6aVZS (deleted)
glusterfs  5151          root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glusterfs  5151          root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_time  5151  5152    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_time  5151  5152    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_sigw  5151  5153    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_sigw  5151  5153    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_mems  5151  5154    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_mems  5151  5154    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_spro  5151  5155    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_spro  5151  5155    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_spro  5151  5156    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_spro  5151  5156    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glusterfs  5151  5157    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glusterfs  5151  5157    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_epol  5151  5159    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_epol  5151  5159    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_epol  5151  5160    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_epol  5151  5160    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_iotw  5151  5161    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_iotw  5151  5161    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_fuse  5151  5167    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_fuse  5151  5167    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_fuse  5151  5168    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_fuse  5151  5168    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
glfs_fuse  5151  5169    root    4w      REG              253,0            0      684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_fuse  5151  5169    root    5w      REG              253,0            0      684069 /tmp/vm-backup.any.lock (deleted)
xhernandez commented 1 year ago

These files seem to come from the virtualization environment, not Gluster

kripper commented 1 year ago

No. These lock files are generated by our software. The point is that they are hold by gluster processes which makes no sense because they are not on a gluster volume.

Maybe OVirt installed something special that causes this strange behavior.

kripper commented 1 year ago

I have noticed that the lock file is taken by a glusterfs process related to a remote fuse.glusterfs mounted volume. After umounting and remounting the remote volume, the lock file is freed on local /tmp (crazy!).

Please let me know which other test you want me to peform next time this bug ocurrs. I'm suspicious of oVirt which I installed many years ago. Maybe RedHat added some addon that introduces the bug in recent versions of gluster.

What is strange is that is was the local host which was rebooted, which means that the file is locked when the local host remounts the remote volume, but only the first time. After umounting and remounting a second time, the lock file is freed.

To reproduce this bug (works on my production environment):

You will notice the file will be still locked (lsof will report it is taken by glusterfs)

To fix, unmount and remount a second time.

kripper commented 1 year ago

Completely reproducible. Maybe you are doing some cleanup and releasing the lock when unmounting the volume? This would explain why I need to umount and remount to fix the issue. Any hint?