facebookresearch / Kats

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
MIT License
4.88k stars 534 forks source link

HourlyRatioDetector doesn't work with T frequency #236

Closed Antorminator closed 1 year ago

Antorminator commented 2 years ago

Hello,

I am trying to detect anomalies in a dataset that contains, over a period of several days, measurements every 5 minutes.

I've seen that I can't use OutlierDetector because it doesn't support frequencies less than 1 day, so I'm trying to use HourlyRatioDetector.

I am trying to use it as follows:

outlierDetection = HourlyRatioDetector(data=ts_data, freq='5T', aggregate='max') # Also tried with 'T' too
outliers = outlierDetection.detector()

But I am getting the following error:

[/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py](https://localhost:8080/#) in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    806                 "Found array with %d sample(s) (shape=%s) while a"
    807                 " minimum of %d is required%s."
--> 808                 % (n_samples, array.shape, ensure_min_samples, context)
    809             )
    810 

ValueError: Found array with 1 sample(s) (shape=(1, 23)) while a minimum of 2 is required by MinCovDet.

I have also noticed that the freq and aggregate parameters, which according to the documentation are optional (https://facebookresearch.github.io/Kats/api/kats.detectors.hourly_ratio_detection.html) must be specified, so I don't know if I'm using an old version or something; I simply install Kats as follows:

!pip install kats

My dataset comes from a csv and I convert it to TimeSeriesData like this:

ts_data = TimeSeriesData(
        time=data['time'], 
        value=data['value'], 
        use_unix_time=True, # timestamp to date
        unix_time_units="s"
)

If it's a bug, I hope this helps to fix it.

Thanks. Greetings.

rohanfb commented 1 year ago

I'm not able to reproduce this on my end (snippet below executes without error). I would make sure that you are using the latest version of Kats (0.2.0) and that your data is of the specified frequency. If it's still an issue, feel free to share the data here.

import kats
from kats.utils.simulator import Simulator
from kats.detectors.hourly_ratio_detection import HourlyRatioDetector
sim = Simulator(n=4500, start='2020-01-01', freq='5T')
ts = sim.level_shift_sim(noise=0.05, seasonal_period=1)
d = HourlyRatioDetector(data=ts, freq='5T', aggregate='max')
d.detector()

However I'll leave this issue open to address the documentation mistake you pointed out. The aggregate param should be required, not optional. Thanks for flagging that.

yangbk560 commented 1 year ago

The HourlyRatioDetector works with hourly data, i.e., if the input data is hourly, there is no need to aggregate it and that's why our default parameter is None.

Antorminator commented 1 year ago

When I did a pip install of Kats, this file was downloaded:

kats-0.2.0-py3-none-any.whl

I think it's the latest version of Kats (or at least, at that time).

My problem came up months ago, and since I needed a solution, I used method overloading to modify the tsatools function _freq_to_period_, which I discovered was the source of the error.

I described everything in my project repository, which can be found here (the descriptions are in Spanish, but the code is not. Search for elif freq == '5T'):

https://github.com/Antorminator/time-series-prediction

Regars!