Open kripper opened 1 year ago
Gluster shouldn't take locks on arbitrary files, even less after a reboot. Could you provide the exact steps to reproduce it and the output of the command you use to "see" that Gluster has a lock on /tmp/test.lock ?
I use lsof
after the reboot, which means that somehow the lock is retaken when glusterfs is started.
Very strange that the locked file isn't located on a Glusterfs volume but on/tmp
.
Maybe it's related with the fact that we originally mounted an OVirt Environment on this server for HA.
I remember OVirt used an especial locking architecture.
Please take a look at this:
root@host ~]# lsof | grep /tmp
tuned 1253 root DEL REG 253,0 684090 /tmp/ffiJ6aVZS
tuned 1253 root 6u REG 253,0 4096 684090 /tmp/ffiJ6aVZS (deleted)
gmain 1253 1317 root DEL REG 253,0 684090 /tmp/ffiJ6aVZS
gmain 1253 1317 root 6u REG 253,0 4096 684090 /tmp/ffiJ6aVZS (deleted)
tuned 1253 1318 root DEL REG 253,0 684090 /tmp/ffiJ6aVZS
tuned 1253 1318 root 6u REG 253,0 4096 684090 /tmp/ffiJ6aVZS (deleted)
tuned 1253 1319 root DEL REG 253,0 684090 /tmp/ffiJ6aVZS
tuned 1253 1319 root 6u REG 253,0 4096 684090 /tmp/ffiJ6aVZS (deleted)
tuned 1253 1320 root DEL REG 253,0 684090 /tmp/ffiJ6aVZS
tuned 1253 1320 root 6u REG 253,0 4096 684090 /tmp/ffiJ6aVZS (deleted)
glusterfs 5151 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glusterfs 5151 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_time 5151 5152 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_time 5151 5152 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_sigw 5151 5153 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_sigw 5151 5153 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_mems 5151 5154 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_mems 5151 5154 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_spro 5151 5155 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_spro 5151 5155 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_spro 5151 5156 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_spro 5151 5156 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glusterfs 5151 5157 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glusterfs 5151 5157 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_epol 5151 5159 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_epol 5151 5159 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_epol 5151 5160 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_epol 5151 5160 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_iotw 5151 5161 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_iotw 5151 5161 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_fuse 5151 5167 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_fuse 5151 5167 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_fuse 5151 5168 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_fuse 5151 5168 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
glfs_fuse 5151 5169 root 4w REG 253,0 0 684079 /tmp/vm-backup-lock.sasm-2.2018-06-07-16-47-42.lock (deleted)
glfs_fuse 5151 5169 root 5w REG 253,0 0 684069 /tmp/vm-backup.any.lock (deleted)
These files seem to come from the virtualization environment, not Gluster
No. These lock files are generated by our software. The point is that they are hold by gluster processes which makes no sense because they are not on a gluster volume.
Maybe OVirt installed something special that causes this strange behavior.
I have noticed that the lock file is taken by a glusterfs
process related to a remote fuse.glusterfs
mounted volume.
After umounting and remounting the remote volume, the lock file is freed on local /tmp (crazy!).
Please let me know which other test you want me to peform next time this bug ocurrs. I'm suspicious of oVirt which I installed many years ago. Maybe RedHat added some addon that introduces the bug in recent versions of gluster.
What is strange is that is was the local host which was rebooted, which means that the file is locked when the local host remounts the remote volume, but only the first time. After umounting and remounting a second time, the lock file is freed.
To reproduce this bug (works on my production environment):
You will notice the file will be still locked (lsof will report it is taken by glusterfs)
To fix, unmount and remount a second time.
Completely reproducible. Maybe you are doing some cleanup and releasing the lock when unmounting the volume? This would explain why I need to umount and remount to fix the issue. Any hint?
Description of problem:
Lock file on non-glusterfs-volume is taken for ever by glusterfs fuse after forced reboot.
The exact command to reproduce the issue:
Used flock() from PHP on /tmp/test.lock which is not a glusterfs. After forced reboot (power failure) the lock file is taken for ever by this process:
/usr/sbin/glusterfs --process-name fuse --volfile-server=myvol --volfile-id=myvol /mnt/myvol
This problem has been there for over 8 years.
It's very easy to reproduce.
Expected results:
Lock should be released after reboot. And maybe not taken by glusterfs if it's not a glusterfs volume.
Mandatory info: - The output of the
gluster volume info
command:Volume Name: backups Type: Distribute Volume ID: 782b8005-6db7-4b91-9854-a9a7ae326fef Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx:/home/datacenter/gluster-bricks/backups Options Reconfigured: performance.readdir-ahead: on
Volume Name: iso Type: Distribute Volume ID: c47ef09e-c383-4952-9b94-f243c40b019b Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx:/home/datacenter/gluster-bricks/iso Options Reconfigured: performance.readdir-ahead: on Volume Name: templates Type: Distribute Volume ID: 287346b6-679d-4ff5-a73f-1dabd6a9147e Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx:/home/datacenter/gluster-bricks/templates Options Reconfigured: performance.readdir-ahead: on
- The output of the
gluster volume status
command:Status of volume: backups-h9 Gluster process TCP Port RDMA Port Online Pid
Brick xxx:/home/datacenter/gluster-bricks/backups 49152 0 Y 1557 NFS Server on localhost N/A N/A N N/A
Task Status of Volume backups
There are no active volume tasks
Status of volume: iso Gluster process TCP Port RDMA Port Online Pid
Brick xxx :/home/datacenter/gluster-bricks/iso 49153 0 Y 2391 NFS Server on localhost N/A N/A N N/A
Task Status of Volume iso
There are no active volume tasks
Status of volume: templates Gluster process TCP Port RDMA Port Online Pid
Brick xxx:/home/datacenter/gluster-bricks/templates 49154 0 Y 2764 NFS Server on localhost N/A N/A N N/A
Task Status of Volume templates
There are no active volume tasks
- The output of the
gluster volume heal
command:Not issued.
Additional info:
- The operating system / glusterfs version:
centos-release-gluster6-1.0-1.el7.centos.noarch glusterfs-fuse-6.10-1.el7.x86_64 glusterfs-libs-6.10-1.el7.x86_64 glusterfs-client-xlators-6.10-1.el7.x86_64 glusterfs-api-6.10-1.el7.x86_64 glusterfs-server-6.10-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-36.el7_9.5.x86_64 glusterfs-6.10-1.el7.x86_64 glusterfs-cli-6.10-1.el7.x86_64
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration