Add attribute 'ontimeout_script' to both resource agents

OndrejHome commented 6 years ago

I have noticed that both resource agents include condition that should allow execution of "onTimeOut" command when the main command in HANA_CALL times out. However from code it looks that this is never executed and there is no HANA_CALL that would pass the needed parameter to enable this functionality. I see the below code that populates the $onTimeOut variable when the HANA_CALL passes the the script after argument '--on-timeout'. However I don't see any occasion when '--on-timeout' is used in both resource agents.

local onTimeOut=""

    while [ $# -gt 0 ]; do
        case "$1" in
            --timeout ) timeOut=$2; shift;;
            --use-su  ) use_su=1;;
            --on-timeout ) onTimeOut="$2"; shift;;
            --cmd ) shift; cmd="$*"; break;;
        esac
        shift
    done

if [ $rc -eq 124 -a -n "$onTimeOut" ]; then

It may be useful to specify this command somehow to gather some data from HANA_CALL timeout happens. Therefore I have slightly edited code to allow specifying the script via resource agent attribute named 'ontimeout_script'. This attribute is optional and by default it has same (empty) value as the variable $onTimeOut had before. Once the script is specified, the resource agent will execute it when HANA_CALL $cmd has timed out. There are 2 additional commands to log 'start' of the command and 'end of command with outputs'. I have included both so it is possible to determine when this script was started - in case that operation in cluster times out, we would be able to tell thanks to that if the script was attempted to run or not.

This request contains the changes for both resource agents and changes in man pages to reflect the same texts as long description in resource agents.

Please let me know if this makes sense or if there is a better way how to achieve running a command when the timeout of HANA_CALL cmd happens.

fmherschel commented 6 years ago

While I like your other pull request to improve the logging of timed-out HANA commands I (for first) do not integrate the patch for a new option about on-timeout-script. Initially it was planned (but currently not used) that the --on-timeout of HANA_CALL would be used by the caller itself to have a fallback action available one the initial HANA_CALL would take to long. So the plan is not to introduce a global on-timeout action but a specific time-out action per HANA_CALL. So far we did not had the need to add specific actions. This is why currently the parameter is not used. If we would introduce a global on-timeout script this would break our flexible condtruct, if we want to use it in the future.

OndrejHome commented 6 years ago

Thank you for explanation about '--on-timeout' parameter. Now it makes sense to me how this was planned. I believe that this pull request should not affect this as the value set via attribute can be overridden by HANA_CALL caller by using the '--on-timeout'. In such case the attribute would look like ignored in current form. So this would be ineffective and would need additional changes that might get a bit complex. I like more the idea that the HANA_CALL callers would have a already good --on-timeout command in future so there would be no need to introduce a customizable command.

I will close this pull request now. In case that the more detailed logging from other pull request is not enough and some sort of on-timeout-script would be needed I can revisit this and think of other way how to implement this. Thank you for review and quick response!

fmherschel commented 6 years ago

You are very wellcome with your ideas. Maybe we could also think about just to use an other name than ontime_script. However if we public add a parameter it is not easy to decline such parameters afterwards, because if we would remove the parameter from the agent it will possible crash at customers site, if they left such a parameter in their configuration. If the functionality beside the better logging would only be needed for better debugging we could also add something like an empty variable which is tracked and optionally the script started. For debugging purposes it could be ok, if the admin needs to fill a variable at the beginnignof the RA. I know that's less comfortable as adding a parameter but less harmful then adding a parameter for the public audience. Thanks again for your contribution!!

OndrejHome commented 6 years ago

I'm usually against the direct modification of RA by admins as such approach feels to dangerous to me. Original idea was to have really something that admin can use and that RA can expect to come to be there. However it may in such case require ideally additional checks if the script is for example runable or present to prevent issues when it should start. Also this would be a commitment to keep such option as you mention for future. For that reason I would rather keep it for later once the good use case for this appears. Thank you for additional explanations and feel free to reach me out if there is any help needed with further changes.

fmherschel / SAPHanaSR-old-private

Add attribute 'ontimeout_script' to both resource agents #43