ClusterLabs / fence-agents

Fence agents
104 stars 160 forks source link

fence_gce: Adds existing operation checks and multiple plug support #400

Closed kj1724 closed 3 years ago

kj1724 commented 3 years ago

In certain cases, Pacemaker can send multiple reset operations at the same time. This added check verifies we do not already have an existing reset for the resource and if so waits for that to complete. This change makes the fence agent more resistant to reset race conditions.

The fence agent has been updated to accept multiple plug arguments at the same time, allowing for simpler pacemaker configurations.

Also, in this change request are some updates to make the spacing consistent throughout the entire file

knet-ci-bot commented 3 years ago

Can one of the admins verify this patch?

oalbrigt commented 3 years ago

You need to follow the indentation used in the agent, and commits should be prefixed with agent name, e.g. "fence_gce: " in this case.

kj1724 commented 3 years ago

I updated the indentation to tabs with a width of 8. If this is not correct let me know and I can change it. I also updated the overall commit message to include fence_gce:

oalbrigt commented 3 years ago

Can you keep the changes to what's needed for the update, and not preferrence in indentation/quotes?

The non-working #! (it's not a comment) is a good example of the kind of new bugs that are easy to miss when too much of the agent is changed, and it also makes the patch and future patches harder to backport.

kj1724 commented 3 years ago

I updated the commit to change only what is needed for the additional functionality.

oalbrigt commented 3 years ago

It seems like you should use the built-in multi plug feature: https://github.com/ClusterLabs/fence-agents/blob/master/lib/fencing.py.py#L910

kj1724 commented 3 years ago

When I try to use the built-in multi plug feature (sudo fence_gce -o reboot -n "VM_1,VM_2") I get the following error: Failed: Cannot use --method cycle for more than 1 plug

Am I doing something wrong or is there another way to use this feature?

wenningerk commented 3 years ago

Guess the general idea behind multiple plugs is that they are possibly connected to a redundant power-supply and thus you have to assure that there is a time when they are all off. And that is probably the reason why cycle (aka reboot) isn't supported with more than one plug as you can't assure that they will reboot at exactly the same time. Maybe not especially useful in your case but I guess that is what the requirement is coming from. Did you try on/off instead?

kj1724 commented 3 years ago

We need to be able to cycle for multiple plugs. on/off performs a graceful shutdown instead the hard termination that would be required if a VM gets into a bad state. Also it is much slower than a cycle operation on a GCE instance.

I think we should be fine to use the built-in multi plug support, if we can somehow get around this cycle restriction.

oalbrigt commented 3 years ago

I'll have to look into why it doesnt support cycle method currently.

oalbrigt commented 3 years ago

Seems like the issue is that it calls the reboot function directly instead of handling it via built-in functions that does the for plug ... for onoff actions: https://github.com/ClusterLabs/fence-agents/blob/master/lib/fencing.py.py#L975

oalbrigt commented 3 years ago

I'll change the implementation so we can use the built-in feature and just merge this after it has been implemented.

kj1724 commented 3 years ago

Thanks for looking into it.

Let me know when you update the default implementation, I will likely want to run some more tests on fence_gce and remove any redundant code.

oalbrigt commented 3 years ago

Ack. I'll link the PR so you can test it.

oalbrigt commented 3 years ago

The patch is ready for you to test: https://github.com/ClusterLabs/fence-agents/pull/408

oalbrigt commented 3 years ago

Thanks.