fact-project / shifthelper

So we can sleep at night.
4 stars 0 forks source link

expert calls every 200minutes #239

Closed dneise closed 7 years ago

dneise commented 7 years ago

At the moment the expert is called every ~200 minutes, with a message like this:

Exception while running check ShifterOnShift: (datetime.datetime(2017, 7, 16, 7, 50), <object object at 0x7fd04a087080>, ('db', None))

The complete traceback looks like this

2017-07-16 10:55:01,688 - custos.checks.FactIntervalCheck - ERROR - Exception while running check
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/custos/checks/__init__.py", line 82, in wrapped_check
    self.check(*args, **kwargs)
  File "/opt/conda/lib/python3.5/site-packages/shifthelper/checks.py", line 38, in check
    if all([f() for f in self.checklist]):
  File "/opt/conda/lib/python3.5/site-packages/shifthelper/checks.py", line 38, in <listcomp>
    if all([f() for f in self.checklist]):
  File "/opt/conda/lib/python3.5/site-packages/wrapt/wrappers.py", line 522, in __call__
    args, kwargs)
  File "/opt/conda/lib/python3.5/site-packages/shifthelper/debug_log_wrapper.py", line 9, in log_call_and_result
    result = wrapped(*args, **kwargs)
  File "/opt/conda/lib/python3.5/site-packages/shifthelper/conditions.py", line 313, in is_nobody_on_shift
    get_current_shifter()
  File "/opt/conda/lib/python3.5/site-packages/shifthelper/tools/shift.py", line 30, in get_current_shifter
    full_shifter_info = retrieve_shifters_from_calendar(db=db)
  File "/opt/conda/lib/python3.5/site-packages/shifthelper/tools/shift.py", line 48, in retrieve_shifters_from_calendar
    calendar_entries = retrieve_calendar_entries(time, db=db)
KeyError: (datetime.datetime(2017, 7, 16, 7, 50), <object object at 0x7fd04a087080>, ('db', None))

So it comes from here: https://github.com/fact-project/shifthelper/blob/7aabafd3b747290a71abba321322e8715012a2f2/shifthelper/tools/shift.py#L61

As one can see, this function call is cached, because I did not want to hit the DB with requests a couple of ten times per check interval (currently every 2 minutes).

But this optimization was most certainly premature. The DB is a local copy of the fact DB on the SH node, so there is not much network in between. Also MySQL DBs typically cache the last few queries (https://dev.mysql.com/doc/refman/5.7/en/query-cache.html) so there is no need to cache this inside the SH itself.

We do not understand exactly, why this cache miss happens right now every 200 minutes, but @MaxNoe found a this related python bug report https://bugs.python.org/issue28969

So as a remedy for this behaviour I propose to simply remove this caching.