Closed ldzhong closed 2 years ago
Can one of the admins verify this patch?
ok to test
There are a few issues here. First, lvmlockctl kill and drop features are specifically for sanlock-based shared VGs (when the lease storage is disconnected); they have no valid use with dlm-based shared VGs. Second, you cannot just forcibly remove locks while they are still being used. Persistent locks are representing active LVs, and you have not mentioned anything about the LVs that require these locks which may still be active. Third, it's not clear what realistic scenario corresponds to the user killing lvmlockd, so it's hard to know what other solutions to recommend for resolving the problem you're having. Fourth, you might be interested in the lvmlockd --adopt option. The adopt option allows lvmlockd to be restarted while there are persistent LV locks in place (although this feature has never been widely tested or used as far as I know.)
Thanks for the review. The ideal resolution to this problem would be the lvmlockd --adopt option, but as you said the lvmlockd daemon should be restarted while persistent LV locks are still in place, and it doesn't apply in cluster environment as cluster manager normally will bring down the VG resource first before restarting lvmlockd. I agree with the second point here, which obviously will cause regression. It seems the agent is also not a perfect place to fix this problem. I think this pull request can be closed for now. I'll figure out if there's any other resolution. Thanks for your time.
If VG is activated exclusively in the cluster and the lvmlockd daemon is killed, the EX locks for LV will become orphaned in the lockspace. Then LV will fail to be activated when cluster trying to bring it up again. With this new parameter enabled, we drop the orphaned locks first left in previous lockspace before activation.