ClusterLabs / fence-agents

Fence agents
104 stars 160 forks source link

fence_nutanix_ahv: Add fence agent support for Nutanix AHV Cluster #600

Closed nxgovind closed 2 weeks ago

nxgovind commented 3 weeks ago

This patch adds fence agent support for Nutanix AHV clusters. More specifically the initial support is aimed at AHV clusters that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by amir.eibagi@nutanix.com

knet-jenkins[bot] commented 3 weeks ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/1/input

knet-jenkins[bot] commented 2 weeks ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/2/input

knet-jenkins[bot] commented 2 weeks ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/3/input

knet-jenkins[bot] commented 2 weeks ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/4/input

nxgovind commented 2 weeks ago

@oalbrigt Thank you for your review comments. I have addressed all of them. Please let me know if I have missed anything else.

knet-jenkins[bot] commented 2 weeks ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/5/input

knet-jenkins[bot] commented 2 weeks ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/6/input

oalbrigt commented 2 weeks ago

retest this please

nxgovind commented 2 weeks ago

Thank you for your review. I ran a few tests with the latest changes on a 3-node CentOS 9 stream, cluster setup. All basic power operations via fence_nutanix_ahv works fine. Also, tested stonith feature by failing a node to confirm that pacemaker successfully resets the failed node. Documenting the test output here.

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --ssl-insecure TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,OFF TestVM2,bda19034-c121-430f-a70c-a872f9dbabf7,OFF Node 1,ae94f8c2-96f1-4c85-bb4a-b1cbd48aeee8,ON Node 2,bdd08b08-d11d-41b8-b59f-8f8ba77d9ae6,ON Node 3,c2c4f047-9a56-460f-9bbb-d7f6d81a2e0c,ON

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --filter="name eq 'TestVM1'" --ssl-insecure TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,OFF

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o on --plug='TestVM1' --ssl-insecure Success: Powered ON

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o reboot --plug='TestVM1' --ssl-insecure Success: Rebooted

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --filter="startswith(name, 'TestVM')" --ssl-insecure TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,ON TestVM2,bda19034-c121-430f-a70c-a872f9dbabf7,ON

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o off --plug='TestVM1' --ssl-insecure Success: Powered OFF

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --filter="startswith(name, 'TestVM')" --ssl-insecure TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,OFF TestVM2,bda19034-c121-430f-a70c-a872f9dbabf7,OFF

tail -f /var/log/pacemaker/pacemaker.log Nov 07 11:07:40.933 node1 pacemaker-fenced [1134] (log_async_result) notice: Operation 'reboot' [1493] targeting node2 using nutanix_fence returned 0 | call 13 from pacemaker-controld.1382 Nov 07 11:07:40.964 node1 pacemaker-fenced [1134] (finalize_op) notice: Operation 'reboot' targeting node2 by node1 for pacemaker-controld.1382@node3: OK (complete) | id=2e28a260 Nov 07 11:07:40.965 node1 pacemaker-controld [1138] (handle_fence_notification) notice: Peer node2 was terminated (reboot) by node1 on behalf of pacemaker-controld.1382@node3: OK | event=2e28a260-bdd1-4154-b98b-1fd14227dc63

[root@node1 ~]# pcs status Cluster name: ha_cluster Cluster Summary:

Node List:

Full List of Resources:

nxgovind commented 2 weeks ago

@oalbrigt I have run some basic tests, including cluster node failure test. Please merge the pull request if you are comfortable with the tests.

oalbrigt commented 2 weeks ago

Thanks.