arundo / adtk

A Python toolkit for rule-based/unsupervised anomaly detection in time series
https://adtk.readthedocs.io
Mozilla Public License 2.0
1.1k stars 148 forks source link

SeasonalAD fit_detect failing #115

Open Pabloendres opened 4 years ago

Pabloendres commented 4 years ago

Hi! I'm working with monthly data from 2006 to 2019, and I wanted to work with SeasonalAD, but it fails with "ValueError: The time steps are not constant." even after validating the series.

Code

from adtk.detector import SeasonalAD
from adtk.data import validate_series

times = validate_series(times)
seasonal_ad = SeasonalAD(c=3.0, side="both")
anomalies = seasonal_ad.fit_detect(times)

Output

ValueError                                Traceback (most recent call last)
<ipython-input-15-a29b7fb737cd> in <module>
      8     #anomalies = iqr_ad.fit_detect(times)
      9     seasonal_ad = SeasonalAD(c=3.0, side="both")
---> 10     anomalies = seasonal_ad.fit_detect(times)
     11 
     12 

~\Anaconda3\envs\venis\lib\site-packages\adtk\_detector_base.py in fit_predict(self, ts, return_list)
    245 
    246         """
--> 247         self.fit(ts)
    248         return self.detect(ts, return_list=return_list)
    249 

~\Anaconda3\envs\venis\lib\site-packages\adtk\_detector_base.py in fit(self, ts)
    150 
    151         """
--> 152         self._fit(ts)
    153 
    154     def predict(

~\Anaconda3\envs\venis\lib\site-packages\adtk\_base.py in _fit(self, ts)
    152         if isinstance(ts, pd.Series):
    153             s = ts.copy()  # type: pd.Series
--> 154             self._fit_core(s)
    155             self._fitted = 1
    156         elif isinstance(ts, pd.DataFrame):

~\Anaconda3\envs\venis\lib\site-packages\adtk\detector\_detector_1d.py in _fit_core(self, s)
   1154     def _fit_core(self, s: pd.Series) -> None:
   1155         self._sync_params()
-> 1156         self.pipe_.fit(s)
   1157         self.freq_ = self.pipe_.steps["deseasonal_residual"]["model"].freq_
   1158         self.seasonal_ = self.pipe_.steps["deseasonal_residual"][

~\Anaconda3\envs\venis\lib\site-packages\adtk\pipe\_pipe.py in fit(self, ts, skip_fit, return_intermediate)
    891                 results.update({step_name: step["model"].predict(input)})
    892             else:
--> 893                 results.update({step_name: step["model"].fit_predict(input)})
    894 
    895         # return intermediate results

~\Anaconda3\envs\venis\lib\site-packages\adtk\_transformer_base.py in fit_predict(self, ts)
     94 
     95         """
---> 96         self.fit(ts)
     97         return self.predict(ts)
     98 

~\Anaconda3\envs\venis\lib\site-packages\adtk\_transformer_base.py in fit(self, ts)
     47 
     48         """
---> 49         self._fit(ts)
     50 
     51     def predict(

~\Anaconda3\envs\venis\lib\site-packages\adtk\_base.py in _fit(self, ts)
    152         if isinstance(ts, pd.Series):
    153             s = ts.copy()  # type: pd.Series
--> 154             self._fit_core(s)
    155             self._fitted = 1
    156         elif isinstance(ts, pd.DataFrame):

~\Anaconda3\envs\venis\lib\site-packages\adtk\transformer\_transformer_1d.py in _fit_core(self, s)
    709         # get seasonal freq
    710         if self.freq is None:
--> 711             identified_freq = _identify_seasonal_period(s)
    712             if identified_freq is None:
    713                 raise Exception("Could not find significant seasonality.")

~\Anaconda3\envs\venis\lib\site-packages\adtk\transformer\_transformer_1d.py in _identify_seasonal_period(s, low_autocorr, high_autocorr)
    856     # check if the time series has uniform time step
    857     if len(np.unique(np.diff(s.index))) > 1:
--> 858         raise ValueError("The time steps are not constant. ")
    859 
    860     autocorr = acf(s, nlags=len(s), fft=False)

ValueError: The time steps are not constant.

Checks

jblocher commented 3 years ago

I'm having the same issue. Perhaps you knew this already, but it seems that this check: if len(np.unique(np.diff(s.index))) > 1: ends up counting days even though the series has a monthly frequency. df.index.is_monotonic allows for the frequency so it sees an increment of 1 for each month, whereas the numpy approach results in an array of 28, 29, 30, and 31 days so the len() function is 4, which is > 1.

m-vishnu commented 3 weeks ago

I'm having the same issue. Perhaps you knew this already, but it seems that this check: if len(np.unique(np.diff(s.index))) > 1: ends up counting days even though the series has a monthly frequency. df.index.is_monotonic allows for the frequency so it sees an increment of 1 for each month, whereas the numpy approach results in an array of 28, 29, 30, and 31 days so the len() function is 4, which is > 1.

changing that if condition to this: if len(pd.date_range(s.index.min(), s.index.max(), freq=s.index.freq.freqstr).difference(s.index)) > 0:

solves the problem for me.