Closed jh23453 closed 6 years ago
Why not using SAPHanaSR-showAttr?
SAPHanaSR-showAttr shows exactly the attributes that crm_mon has (but on less screen space). Still, the sync state for the secondary is either SOK or SFAIL. But SFAIL has to possible reasons:
Right now the admin must look at HANA studio or landscape.py to distiguish the two.
When we first implemented our cluster the Linux admin saw SFAIL and always thought that HANA S/R had failed, even when it was still syncing. So HANA admins looked the state up and provided updates to the Linux admins. I think it's a nice idea to see what's the reason for SFAIL is.
Another possibility could be to split the SFAIL into two states - SFAIL for 1. and STILL-SYNCING for 2. But that would require some more extensive changes to the SAPHana logic - instead of simply adding another attribute to display.
Does that clarify what I think?
I do not think that we can show 'all' possible status situations/information around HANA just using crm_mon. Maybe if you write an enhanced wrapper for systemReplicationStatus.py and tell your admins to call that tool, if the attribute is set to "SFAIL" that could be a solution.
Changing from "SOK/SFAIL" to "SOK/SFAIL/STILL-SYNCING" is not an option, because this would break backward compatibility and it would also not be compatible with Scale-Out where we need to use a different interface to be informed (HA/DR provider). In this latter case we do not have a return code but get called, if the SR is in sync again.
fmherschel notifications@github.com writes:
I do not think that we can show 'all' possible status situations/information around HANA just using crm_mon. Maybe if you write an enhanced wrapper for systemReplicationStatus.py and tell your admins to call that tool, if the attribute is set to "SFAIL" that could be a solution.
I guess so - I'll append the script I did for us next week to the issue and close it.
Changing from "SOK/SFAIL" to "SOK/SFAIL/STILL-SYNCING" is not an option, because this would break backward compatibility and it would also not be compatible with Scale-Out where we need to use a different interface to be informed (HA/DR provider). In this latter case we do not have a return code but get called, if the SR is in sync again.
Yes, that's also what I saw - adding another state would be too complex even in our (simple) scenario.
Thank's for your feedback and your work - we're quite happy with our current installation.
-- This space is intentionally left blank.
I guess so - I'll append the script I did for us next week to the issue and close it.
That would be great. We would review it and add it to the 'tools' of the SAPHanaSR package. Alternatively you could create a pull request to this project. Add your tool to subdirectory "tools".
Here's the script. have it running on the (new) master to monitor sync progress/state as
If you have questions or I should add comments, feel free to ask.
#!/bin/bash
FULL_SR_STATUS=$(python /hana/shared/$SAPSYSTEMNAME/exe/linuxppc64/hdb/python_support/systemReplicationStatus.py 2>/dev/null); srRc=$?
case $srRc in
10) sr_state="No HANA System Replication";show_detail=0;;
11) sr_state="Error" ;show_detail=0;;
12) sr_state="Unknown" ;show_detail=0;;
13) sr_state="Initializing" ;show_detail=1;;
14) sr_state="Syncing" ;show_detail=1;;
15) sr_state="Active (all services in sync)" ;show_detail=1;;
*) echo "Unknown Status" ;show_detail=1;;
esac
if [ "$show_detail" = "1" ]; then
sr_state_detail=$(gawk -F '|' \
'function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
/^\|/ && NR>3 {
if ( out != "" ) { out=out "," }
state=trim($14)
if ( state != "ACTIVE" ) {
out=out trim($4) ":"
out=out state "(" trim($15) ")" }}
END { if ( out == "" ) {
print "all services in sync"
} else {
print out }}' <<< "$FULL_SR_STATUS")
echo "$sr_state: $sr_state_detail"
else
echo "$sr_state"
fi
Thank you for providing the code, closing the issue.
We often use "crm_mon -rRA" (--inactive --show-detail --show-node-attributes) to use the cluster status (master/slave on which host, sync status, HANA status). Until now we use HANA Studio to view the detailed SR status, because SFAIL can have two meanings:
We might use systemReplicationStatus.py on the command line as well, but will need both crm_mon and systemReplicationStatus.py for a complete status.
I've tried to get a more detailed status in one line and think about adding it to "crmmon -A". Right now we have the attribute `hana${sid}_syncstate
(SOK, SFAIL, UNKNOWN) for the secondary and PRIM for the primary HANA DB. What would be a useful, more detailed status (attribute like
hana${sid}_sync_detail`)?"all services in sync": When systemReplicationStatus.py reports all is well and sync_state will be "SOK"
[<service>:<Replication Status>(<Replication Status Details),?]+
For each service where the Replication Status is not "ACTIVE" we'll display it and the Status Details. Examples:nameserver:ERROR(communication channel closed) indexserver:SYNCING(missing log...)
The states of the services will be concatenated with ','.
Now we have a short summary of all "interesting" services which are not in sync (yet).
Right now I check the detailed state with a standalone awk-script (around 10 lines...), but if you think that would be a useful addition I'll try to add it to the cluster attributes and provide a patch for SAPHana.