fmherschel / SAPHanaSR-old-private

15 stars 8 forks source link

SAPHanaTopology: hanaRemoteHost not set in enviroments where cluster nodename is not identical to hostname #20

Closed fdanapfel closed 9 years ago

fdanapfel commented 9 years ago

In the Debug log of the SAPHanaTopology Resource Agent you can se that it is unable to determine the correct value for the hanaRemoteHost parameter in environments where the nodename is not identical to the hostname:

Jun 10 17:34:06 node2 SAPHanaTopology(rsc_SAPHanaTopology_HDB_HDB00)[11188]: INFO: DEC: site=DC2, mode=primary, MAPPING=node1, hanaRemoteHost=

fdanapfel commented 9 years ago

Looks like the he following part in SAPHanaTopology is responsible for this:

#
# figure-out all needed values from system replication status with ONE call
# we need: mode=primary|sync|syncmem|...; site name=<site>; mapping/<me>=<site>/<node> (multiple lines)
case $(crm_attribute --type crm_config --name cluster-infrastructure -q) in
   *corosync* ) nodelist=$(crm_node -l | awk '{ print $2 }');;
   *openais* ) nodelist=$(crm_node -l | awk '/member/ {print $2}');;
   *cman*    ) nodelist=$(crm_node -l);;
esac
hdbANSWER=$(su - ${sidadm} -c "hdbnsutil -sr_state --sapcontrol=1" 2>/dev/null)
super_ocf_log debug "DBG2: hdbANSWER=\$\(su - ${sidadm} -c \"hdbnsutil -sr_state --sapcontrol=1\"\)"
site=$(echo "$hdbANSWER" | awk -F= '/site name/ {print $2}')
srmode=$(echo "$hdbANSWER" | awk -F= '/mode/ {print $2}')
MAPPING=$(echo "$hdbANSWER" | awk -F[=/] '$1 ~ "mapping" && $3 !~ site { print $4 }' site=$site)
super_ocf_log debug "DBG: site=$site, mode=$srmode, MAPPING=$MAPPING"
#
# filter all non-cluster mappings
#
hanaRemoteHost=$(for n1 in $nodelist; do for n2 in $MAPPING; do if [ "$n1" == "$n2" ]; then echo $n1; fi; done; done )
    super_ocf_log info "DEC: site=$site, mode=$srmode, MAPPING=$MAPPING, hanaRemoteHost=$hanaRemoteHost"

Looks like it is trying to compare the Hana hostnames against the cluster nodenames, and since they differ hanaRemoteHost never gets set.

However in my test environment the remoteHost attribute is still set to the correct value, so it looks like it gets set somewhere else as well. Haven't figured out how, though.

fmherschel commented 9 years ago

Ah ok, I just reviewed your picked lines from the SAPHanaTopology. I guess that this message (with the empty remote HOST) only applies, when the HANA is DOWN. In this case we could not determine the remoteHANAHost, because hdbnsutil does not give us this info. In this situation we just use "the" other node in a two-node-setup. This is one of the reasons, why we are limited to 2 nodes in a scale-up scenario :/

fdanapfel commented 9 years ago

No, on my setup this actually happens also on a running cluster where HANA is UP, and there the message is printed every minute when the "monitor_clone" for the SAPHanaTopology resource is running.

As far as I can see the reason is because of the following line: hanaRemoteHost=$(for n1 in $nodelist; do for n2 in $MAPPING; do if [ "$n1" == "$n2" ]; then echo $n1; fi; done; done )

The problem here is not that "hdbnsutil" can't provide the information, but that the comparison uses "nodelist", which is the list of cluster nodenames, and tries to compare that to the SAP HANA hostname it got by parsing the hdbnsutil output, which obviously in an environment where the cluster nodenames are not identical to the hostnames.

fdanapfel commented 9 years ago

I've now tested what happens if you delete the 'hana__remotehost' attribute and then restart the cluster. I did not notice any errors when the resources were started again. Since the SAPHana resource agent contains a check if the attribute is set and if not uses another method to determine the remoteHost it is probably safe to assume that we could actually get rid of the attribute.

fmherschel commented 9 years ago

... the remoteHost it is probably safe to assume that we could actually get rid of the attribute. Unfortunately we need the remote(HANA)Host Name for the REGISTRATION of a former primary, if AUTMATED_REGISTER is set to true. In that case we need to know the exact HANA virtual host name. Just using an other name of the remote host is not sufficient. In my tests the registration failed than :(

fmherschel commented 9 years ago

Just have created (and answerd) a pull request against master. Could you please check, if the error reported here is fixed now?

fdanapfel commented 9 years ago

Thanks, with the latest version of SAPHanaTopology the error does not appear any more and the remoteHost attribute gets set correctly.

Regarding the previous comment about getting rid of the attribute: I'm aware that it is needed for the registration of a former primary, and as far as I can see the SAPHana resource agent has various checks built in to determine the correct remote HANA Hostname even if the attribute isn't set or contains the incorrect value. So what I meant was that we could get rid of letting the SAPHanaTopology agent trying to set this attribute and let the SAPHana agent determine the correct value when it needs to as it already does.

fdanapfel commented 9 years ago

Did not see any more issues after applying the patch, therefore closing this issue.