Open chen1195585098 opened 1 week ago
Hi, @pranithk , can we simply set lock->release = _gf_true
to avoid this issue? When this issue is triggered, inodelk request from other mountpoint will be blocked,which greatly effect the availability of a replicated volume.
Will such a fix bring a worse effection? I noticed you provided this method in #3065, but no further modification was made.
Description of problem: Same as #3065 .
There are many
Assertion failed: !(*take_lock)
appearing successively in our test and production environemnts.After preliminary investigation, it seems that there is a race between
__afr_eager_lock_handle
andafr_wakeup_same_fd_delayed_op
. In the lifecycle of eager lock, when transactions complete fop phase, they will be moved toinode->post_op
list, which is expected to be waked up as the timer expired or the last transaction on this inode get finished. However, such a wakeup process will be triggered each time a FLUSH is issued. When the first fop finished,afr_flush
is called and it will wakeup delayed post_op on the same inode fromafr_wakeup_same_fd_delayed_op
. Once this operation succeed,lock->delay_timer = NULL
will be set, which means all subsequent transactions need to wait and will be put into next lifecycle.When a new write transaction on the same inode comes at the same time, howerver, it will be incorrectly added to owners list due to
lock->release == false, lock->delay_timer == NULL, list_empty(&lock->owners) == true
and issues an extra lock request incurrent
lifecycle, which makesstale inodelk
.From the code, I guess both
lock->release
andlock->delay_timer
are marks to indicate whether a new transaction can to be added in current lifecycle, because the two conditions seem to hold at the same time. (e.g.lock->release = _gf_true
andlock->delay_timer = NULL
are set inafr_delayed_changelog_wake_up_cbk
, when current transaction is thelast owner
of lock).After verification, it is found that setting
lock->release = _gf_true
inafr_wakeup_same_fd_delayed_op
will handle this problem. It is the simplest way to avoid stale inodelks while the root cause of the race is still unsolved.In fact, the root cause of this issue is that
afr_wakeup_same_fd_delayed_op
is triggered too often. Ideally, it should only be triggered once in the last transaction of inode in a eager lock lifecycle as described at #418 . If so, this race is gone. However, it is hard to determine whether a transaction is the last without introducing more complexity.The exact command to reproduce the issue:
1、modify glusterfs code to intensify race.
2、create and start a normal replicated volume
test
, mount this volume. (e.g.mount -t glusterfs localhost:test /mnt
) 3、write a same file simultaneously with 2 threads.After
Assertion failed: !(*take_lock)
is printed in mount log, stale inodelk occurs, and can be checked with statedump. The full output of the command that failed: