Marcnuth / AnomalyDetection

Twitter's Anomaly Detection in Pure Python
Apache License 2.0
304 stars 76 forks source link

local variable 'period' referenced before assignment #11

Closed hokiegeek2 closed 5 years ago

hokiegeek2 commented 5 years ago

Getting this error in the context of TS anomaly detection on a metricbeat log stream where the timestamps are all from the same day:

results.append(anomaly_detect_ts(pd.Series(item), **self._generateParams())) File "build/bdist.linux-x86_64/egg/anomaly_detection/anomaly_detect_ts.py", line 207, in anomaly_detect_ts UnboundLocalError: local variable 'period' referenced before assignment

I can see in the code why this is happening I am hitting the else clause with granularity of ms and period is never initialized. Logging shows why:

2018-09-19 19:25:49.858000 (data.index[1]) 2018-09-19 19:25:49.180000 (data.index[0])

So from a pure computer science perspective, it's clear what's happening. timediff = data.index[1] - data.index[0] if timediff.days > 0: num_days_per_line = 7 only_last = 'day' if only_last == 'hr' else only_last period = 7 granularity = 'day' elif timediff.seconds / 60 / 60 >= 1: granularity = 'hr' period = 24 elif timediff.seconds / 60 >= 1: granularity = 'min' period = 1440 elif timediff.seconds > 0: granularity = 'sec'

Aggregate data to minutely if secondly

    data = data.resample('60s', label='right').sum()
else:
    granularity = 'ms'

The questions:

Should I change the sampling so that anomaly_detect_ts gets data points at intervals of 1 min or more, or should I consider anomaly_detect_vec for granularity of sec?

Any guidance is very much appreciated, thanks!

--John

hokiegeek2 commented 5 years ago

I think at a minimum there should be logic to throw an error, stating that the ms level of granularity is not supported, in analogy to what pycularity does. In addition, need to set period within sec block. Will make changes and test

hokiegeek2 commented 5 years ago

Okay, looking at the code more closely and following some testing with a static file of metricbeats data, and I am seeing that the Pandas resample method is working correctly, converting the sec intervals to min:

timestamp 2018-09-20 12:59:00 486581492 2018-09-20 13:00:00 471500234 2018-09-20 13:01:00 486703848 2018-09-20 13:02:00 485937599 2018-09-20 13:03:00 21696736493 2018-09-20 13:04:00 391 2018-09-20 13:05:00 249000260 2018-09-20 13:06:00 463573558 2018-09-20 13:07:00 491965349 2018-09-20 13:08:00 499765826 2018-09-20 13:09:00 482683757 2018-09-20 13:10:00 494652855 2018-09-20 13:11:00 501479188 2018-09-20 13:12:00 507338630 2018-09-20 13:13:00 491230524 2018-09-20 13:14:00 236295619 2018-09-20 13:15:00 0 2018-09-20 13:16:00 311486949 2018-09-20 13:17:00 489250187 2018-09-20 13:18:00 487550592 2018-09-20 13:19:00 472133907 2018-09-20 13:20:00 506761783 2018-09-20 13:21:00 485043873 2018-09-20 13:22:00 485422510 2018-09-20 13:23:00 147205096

But, since the granularity is sec, the period is never set. I imagine I need to set period=1440. I will try that and see what happens.

hokiegeek2 commented 5 years ago

Tested error handling for ms granularity and confirmed sec granularity sets period to 1440. Pushed pull request

hokiegeek2 commented 5 years ago

@triciascully this is implemented and tested in the master branch of my fork

hokiegeek2 commented 5 years ago

Fixed and merged