arundo / adtk

A Python toolkit for rule-based/unsupervised anomaly detection in time series
https://adtk.readthedocs.io
Mozilla Public License 2.0
1.05k stars 143 forks source link

Q: Seasonal Anomaly doesn't consider months #68

Open ajdapretnar opened 4 years ago

ajdapretnar commented 4 years ago

Sorry to bother you again. I am experimenting with SeasonalAD() and it just looks like it cannot detect some obvious seasonal patterns. I have traffic data for 3 years, one measurement per hour. I tried different parameters (c=5, c=10, trend=True, freq=24 (day), freq=720 (month), freq=8760 (year)), but nothing seemed to help - every time I get anomalies for summer months when traffic increases due to tourism. Since the increase is seasonal, I wonder why doesn't SeasonalAD() consider this.

Thanks!

Plot of detected anomalies shows too many anomalies in summer months. temp

tailaiw commented 4 years ago

@ajdapretnar Interesting... I'm not quite sure what the problem is. Did you try to freq=168, because I can see a weekly pattern from the plot?

SeasonalAD does not support multiple seasonal frequencies. But it is straightforward to create a pipeline to remove multiple seasonal patterns sequentially. For example, the following code removes the daily, weekly, and yearly patterns before run an inter-quartile based outlier detector.

from adtk.pipe import Pipeline
from adtk.transformer import ClassicSeasonalDecomposition
from adtk.detector import InterQuartileRangeAD

model = Pipeline([
    ("yearly", ClassicSeasonalDecomposition(freq=24*365)), 
    ("weekly", ClassicSeasonalDecomposition(freq=24*7)), 
    ("daily", ClassicSeasonalDecomposition(freq=24)), 
    ("ad", InterQuartileRangeAD(c=3))])

If allowed, you are more than welcome to share the data here and we may dive deeper into it.

ajdapretnar commented 4 years ago

Gotcha! So I repeat CSD for each seasonal trend. Neat! I initially though this happens internally: "Detector adtk.detector.SeasonalAD uses transformer adtk.transformer.ClassicSeasonalDecomposition to remove the seasonal pattern from the original time series." But I suppose it doesn't work for multiple seasonal patterns. Thanks for the tip!

I am attaching the data. I have already done all the preprocessing with removal of NaNs and such. STM82-sample.csv.zip