Closed centiqchris closed 1 year ago
@fmherschel @pirat013 @lpinne sorry this question (bug?) slipped through my attention ... Could you please have a look and answer @centiqchris ?
Thank you very much!
Hi @fmherschel @lpinne @pirat013 could you please have a look at this issue and decide if this needs to be changed in the document? Thank you very much!
@centiqchris Thanks, Chris, for you question and exact reading of our best practice. Please give me some time to reproduce that. In theory I understand what you mean. If the cluster would sa I do not have the quorum in must not process anything like a takeover. So from this side you are perfectly right. And it is also true that the corosync layer "heals" the lost node (50% nodes left out of 2). I just want to review, what exactly is the output of crm_mon in that situation as I remeber the it was like we documented it in the paper. Maybe the quorum in the 2-node-cluster comes back after the node changes to OFFLINE (which means it has been fenced correctly after the loss). See step 3.
Hi @fmherschel - I look forward to hearing your test results. I've had the occasion to test this multiple times over the last few months and I've yet to see it say partition without quorum. My testing was not on SBD based, but IPMI based fencing which is the only difference I can think between my tests and yours, but I can't see it having an impact. If the surviving node (2) lost quorum, it would not then place the poison pill for node 1 preventing pacemaker from starting when node 1 returns, nor would it takeover.
Thanks for looking into it.
Could you share which SLES for SAP version you have used for testing? If I am tesing it on SLE15 code base I also get the "with quorum".
I was using SLES 12 SP5.
Any updates on this?
@fmherschel Fabian, could you look into this issue again? If yes, could you please inform Chris? Thank you very much!
@fmherschel @centiqchris just checking in: is this solved? Thanks very much!
@centiqchris and @chabowski I am sorry I lost sight of this issue... We will fix that in our next version of the guide. But need to check which other guides might also be effected. Currently I could not tell when we could publish the fix, lets see...
@centiqchris I just asked a pacemaker core developer and he told me that difference was introduced, if you use the new method to define a two-node cluster. The new method is setting <two_node: 1>. So we will fix that in all effected guides. Thanks for reporting and for coming back here.
guide has been updated
Hi, we have fixed the setup guide. Thanks a lot! Regards, Lars
Hi,
In the expectation section, after losing the secondary node the first expectation is:
I believe this to be incorrect. In the 2 node cluster as described in the doc, two_node is set to 1 in the corosync.conf which reduces the number of votes for quorum to 1. As the sole remaining node in the cluster, it will then retain quorum in this case. This is found in my tests also. The status remains "partition with quorum".
https://documentation.suse.com/sbp/all/html/SLES4SAP-hana-sr-guide-PerfOpt-12/index.html#id-1.11.6.12.2.5