fact-project / shifthelper

So we can sleep at night.
4 stars 0 forks source link

`IsUserAwakeBeforeShutdown` called shifter and fallback, right away. #250

Open dneise opened 7 years ago

dneise commented 7 years ago

This morning, SH called 20min before shutdown, because nobody was awake. so far so good. But it called the fallback immediately and not after 15 minutes. I do not know why.

dneise commented 7 years ago

from the logs (removed private information:

2017-10-13 05:50:09,844 - shifthelper.conditions - DEBUG - is_nobody_awake -> True
2017-10-13 05:50:09,907 - shifthelper.checks - DEBUG - Giving message status WARNING
2017-10-13 05:50:09,969 - shifthelper.notifiers - DEBUG - Got a message
2017-10-13 05:50:09,969 - shifthelper.notifiers - DEBUG - Message is over alert level
2017-10-13 05:50:09,969 - shifthelper.notifiers - DEBUG - Getting phone number of primary shifter
2017-10-13 05:50:09,990 - shifthelper.tools.shift - DEBUG - Found shifter <shifter>
2017-10-13 05:50:09,991 - shifthelper.notifiers - DEBUG - Getting phone number of fallback shifter
2017-10-13 05:50:09,991 - shifthelper.notifiers - INFO - Calling +*******83
2017-10-13 05:50:09,991 - custos.notify.twilio - DEBUG - Received message
2017-10-13 05:50:09,992 - custos.notify.twilio - DEBUG - Placing call
2017-10-13 05:50:10,902 - shifthelper.notifiers - INFO - Calling +********34
2017-10-13 05:50:10,903 - custos.notify.twilio - DEBUG - Received message
2017-10-13 05:50:10,903 - custos.notify.twilio - DEBUG - Placing call
2017-10-13 05:50:11,568 - shifthelper.tools.shift - DEBUG - Found shifter <shifter>
2017-10-13 05:50:11,947 - custos.notify.log - WARNING - Check: IsUserAwakeBeforeShutdown - Message: There is a shift at the moment and 
20min before shutdown and 
Parker not Awake
dneise commented 7 years ago

This is the part of code, responsible for calling the fallback:

if self._get_oldest_call_age() >= self.time_before_fallback:
    log.debug('Getting phone number of fallback shifter')
    numbers_to_call.append(self.phone_number_of_fallback_shifter())

This self._get_oldest_call_age() looks at a list of calls and just finds the oldest call. Calls are removed from this list, if they are too old and if they were already acknowledged.


I found a bug in the function removing old and acknowledged calls:

    def _remove_acknowledged_and_old_calls(self):
        """ from the list of not acknowledged calls
        remove all calls, which have been acknowledged on the web page

        Also remove calls older than 2 hours, to get out of
        a "call the backup shifter" dead lock
        """
        alerts = {a['uuid']: a for a in get_alerts()}
        if not alerts:
            return

        for msg in copy(self.not_acknowledged_messages):
            age = datetime.datetime.utcnow() - msg.timestamp
            if age > (self.max_time_for_fallback + self.time_before_fallback):
                self.not_acknowledged_messages.remove(msg)
            else:
                try:
                    alert = alerts[str(msg.uuid)]
                except KeyError:
                    continue

                if alert['acknowledged'] is True:
                    self.not_acknowledged_messages.remove(msg)

It says: When you cannot get any alerts from the website ... return. So it does not even try to remove too-old alerts, when it cannot find out what was acknoledged. This function is doing to independent things... and if the one thing is impossible, it does not even try the other idenpendent thing. That is of course a bug.

I think, I might have recently introduced this bug .. let me see.

dneise commented 7 years ago

Ah the reason, why it showed up today was:

Normally there is at least one alert in the list of alerts ... the dummy-alert from the start-up checklist. But today this dummy alert was not in the list. Either the shifter did not try it. Or it did not work. Either way ... the shifter should at least have it reported in the log-book. If not having called one of the SH expert in the evening, because if the dummy alert is not working, this is serious. ...

Well ... this error (not doing the dummy-alert) ... made us realize this bug. ... so it was a good thing in the end.

dneise commented 7 years ago

The fix is easy:

instead of

        alerts = {a['uuid']: a for a in get_alerts()}
        if not alerts:
            return

we just need:

        alerts = {a['uuid']: a for a in get_alerts()}

alerts will be an empty dict now and the code below is fine with an empty set of alerts...

dneise commented 7 years ago

We are running the shifthelper_heartbeat branch at the moment and indeed I edited this part ... but the same behaviour (returning, instead of carrying on with an empty set of alerts) was already there before my edit .. so it wasnt me :-D

But I should have seen that (embarrassed)