Closed hokiegeek2 closed 5 years ago
I think at a minimum there should be logic to throw an error, stating that the ms level of granularity is not supported, in analogy to what pycularity does. In addition, need to set period within sec block. Will make changes and test
Okay, looking at the code more closely and following some testing with a static file of metricbeats data, and I am seeing that the Pandas resample method is working correctly, converting the sec intervals to min:
timestamp 2018-09-20 12:59:00 486581492 2018-09-20 13:00:00 471500234 2018-09-20 13:01:00 486703848 2018-09-20 13:02:00 485937599 2018-09-20 13:03:00 21696736493 2018-09-20 13:04:00 391 2018-09-20 13:05:00 249000260 2018-09-20 13:06:00 463573558 2018-09-20 13:07:00 491965349 2018-09-20 13:08:00 499765826 2018-09-20 13:09:00 482683757 2018-09-20 13:10:00 494652855 2018-09-20 13:11:00 501479188 2018-09-20 13:12:00 507338630 2018-09-20 13:13:00 491230524 2018-09-20 13:14:00 236295619 2018-09-20 13:15:00 0 2018-09-20 13:16:00 311486949 2018-09-20 13:17:00 489250187 2018-09-20 13:18:00 487550592 2018-09-20 13:19:00 472133907 2018-09-20 13:20:00 506761783 2018-09-20 13:21:00 485043873 2018-09-20 13:22:00 485422510 2018-09-20 13:23:00 147205096
But, since the granularity is sec, the period is never set. I imagine I need to set period=1440. I will try that and see what happens.
Tested error handling for ms granularity and confirmed sec granularity sets period to 1440. Pushed pull request
@triciascully this is implemented and tested in the master branch of my fork
Fixed and merged
Getting this error in the context of TS anomaly detection on a metricbeat log stream where the timestamps are all from the same day:
results.append(anomaly_detect_ts(pd.Series(item), **self._generateParams())) File "build/bdist.linux-x86_64/egg/anomaly_detection/anomaly_detect_ts.py", line 207, in anomaly_detect_ts UnboundLocalError: local variable 'period' referenced before assignment
I can see in the code why this is happening I am hitting the else clause with granularity of ms and period is never initialized. Logging shows why:
2018-09-19 19:25:49.858000 (data.index[1]) 2018-09-19 19:25:49.180000 (data.index[0])
So from a pure computer science perspective, it's clear what's happening. timediff = data.index[1] - data.index[0] if timediff.days > 0: num_days_per_line = 7 only_last = 'day' if only_last == 'hr' else only_last period = 7 granularity = 'day' elif timediff.seconds / 60 / 60 >= 1: granularity = 'hr' period = 24 elif timediff.seconds / 60 >= 1: granularity = 'min' period = 1440 elif timediff.seconds > 0: granularity = 'sec'
Aggregate data to minutely if secondly
The questions:
Should I change the sampling so that anomaly_detect_ts gets data points at intervals of 1 min or more, or should I consider anomaly_detect_vec for granularity of sec?
Any guidance is very much appreciated, thanks!
--John