arachnys / cabot

Self-hosted, easily-deployable monitoring and alerts service - like a lightweight PagerDuty
MIT License
5.59k stars 594 forks source link

False positive check #595

Closed onagorodniuk closed 6 years ago

onagorodniuk commented 6 years ago

Hello,

I have some strange behavior, sometimes different checks fails but in RAW data there is value, see screenshot:

image

JeanFred commented 6 years ago

Just to confirm: this is a Graphite check, correct?

onagorodniuk commented 6 years ago

@JeanFred Yes, that's correct. Sorry for quite short description of issue.

It is latest cabot installed from master branch (probably 0.11.9) ~month ago. I have made some tuning to prevent data missing in cabot in file cabot/cabotapp/graphite.py:

def get_data(target_pattern, mins_to_check=None):
    if mins_to_check:
        _from = '-%dminute' % mins_to_check
    else:
        _from = graphite_from
    if _from is '-1minute' or '-2minute':
        _from = '-3minute'
    resp = requests.get(
        graphite_api + 'render', auth=auth,
        params={
            'target': target_pattern,
            'format': 'json',
            'from': _from,
            'until': '-2minute',

So exact query for every check is http://graphite.host/render?until=-2minute&from=-3minute&target=sample.metric.value&format=json

onagorodniuk commented 6 years ago

@JeanFred Also hitting button 'Run check manually' gives thats behavior every time.

onagorodniuk commented 6 years ago

Looks like I have figure out what was the cause of the problem. From beginign of using cabot I have facing next problem: When cabot performs the graphite check it get latest data from graphite but sometimes datapoint can't be received before the check is done, so it make check failed due to host missing.

So in previous versions of cabot I have made some dirty hack, I have made time on graphite server two minutes earlier than actual time and it worked prety fine.

In current version this hack doesn't works. So I modified cabot/cabotapp/graphite.py to get previous data from graphite like that: http://graphite.host/render?until=-2minute&from=-3minute&target=sample.metric.value&format=json

So I faced this issue because of that dirty hacks. To be clear it is because fuction validate_datapoint in cabot/cabotapp/graphite.py

def validate_datapoint(datapoint, mins_to_check, utcnow):
    val, timestamp = datapoint
    secs_to_check = 60 * mins_to_check
    if val is None:
        return False
    **if timestamp > (utcnow - secs_to_check):**
        return True
    else:
        return False

Some times timestampwas less than utcnow - secs_to_check and this is because checks in my case was false possitive. So probably issue can be closed. Or maybe is a good cause to discus how could be fixed problem with missing data in graphite check.

dbuxton commented 6 years ago

Can you give a detailed concrete example of where you face this problem on a new ticket? It's a little hard to understand all the moving parts.

I'll close the issue for the time being as it doesn't seem like a bug.