ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 581 forks source link

LVM-activate: adding new parameter auto_drop_lock #1808

Closed ldzhong closed 2 years ago

ldzhong commented 2 years ago

If VG is activated exclusively in the cluster and the lvmlockd daemon is killed, the EX locks for LV will become orphaned in the lockspace. Then LV will fail to be activated when cluster trying to bring it up again. With this new parameter enabled, we drop the orphaned locks first left in previous lockspace before activation.

knet-ci-bot commented 2 years ago

Can one of the admins verify this patch?

oalbrigt commented 2 years ago

ok to test

teigland commented 2 years ago

There are a few issues here. First, lvmlockctl kill and drop features are specifically for sanlock-based shared VGs (when the lease storage is disconnected); they have no valid use with dlm-based shared VGs. Second, you cannot just forcibly remove locks while they are still being used. Persistent locks are representing active LVs, and you have not mentioned anything about the LVs that require these locks which may still be active. Third, it's not clear what realistic scenario corresponds to the user killing lvmlockd, so it's hard to know what other solutions to recommend for resolving the problem you're having. Fourth, you might be interested in the lvmlockd --adopt option. The adopt option allows lvmlockd to be restarted while there are persistent LV locks in place (although this feature has never been widely tested or used as far as I know.)

ldzhong commented 2 years ago

Thanks for the review. The ideal resolution to this problem would be the lvmlockd --adopt option, but as you said the lvmlockd daemon should be restarted while persistent LV locks are still in place, and it doesn't apply in cluster environment as cluster manager normally will bring down the VG resource first before restarting lvmlockd. I agree with the second point here, which obviously will cause regression. It seems the agent is also not a perfect place to fix this problem. I think this pull request can be closed for now. I'll figure out if there's any other resolution. Thanks for your time.