etsy / skyline

It'll detect your anomalies! Part of the Kale stack.
http://codeascraft.com/2013/06/11/introducing-kale/
Other
2.14k stars 336 forks source link

Series with periodic #85

Open hit9 opened 10 years ago

hit9 commented 10 years ago

For example, a series with periodic: 1 day, data at 12:00 is a peak(i.e 1000), and at 0:00 is 10, so, 1000 at 12:00 should be normal, and 10 at 12:00 should be anomalous.

But skyline thinks 10 is normal.

astanway commented 10 years ago

Pull requests accepted...

Seasonal algorithms are hard to automatically fit. Working on it, though...

On May 8, 2014, at 12:25 AM, 王超 notifications@github.com wrote:

For example, a series with periodic: 1 day, data at 12:00 is a peak(i.e 1000), and at 0:00 is 10, so, 1000 at 12:00 should be normal, and 10 at 12:00 should be anomalous.

But skyline thinks 10 is normal.

— Reply to this email directly or view it on GitHub.

hit9 commented 10 years ago

A way is, use Fast Fourier Transform to detect series's periodic, and fetch datapoints at the same phase, then analyze the new dataset.

I am looking inside now ..

astanway commented 10 years ago

Yep! That's what I was leaning towards - use FFT to get periodicity, and maybe use that to populate an ARIMA or use a KS test along windowed intervals? cc @toufic

On May 8, 2014, at 6:01 AM, 王超 notifications@github.com wrote:

A way is, use Fast Fourier Transform to detect series's periodic, and fetch datapoints at the same phase, then analyze the new dataset.

I am looking inside now ..

— Reply to this email directly or view it on GitHub.

hit9 commented 10 years ago

I'm not so sure of the last question, but the method to detect periodicity, I get some information from : http://stackoverflow.com/questions/15261122/determine-frequency-from-signal-data-in-matlab

And, this function may help:

def guess_period(x):
    x = np.array(x)
    n = np.size(x)
    m = np.mean(x)
    p = np.abs(np.fft.fft(x - m))
    i = np.argmax(p)
    if i:
        return n / float(i)

this might gives a series's period, but some fails:

>>> x = [1, 20, 2, 20, 1, 21, 2, 22, 1, 19]
>>> guess_period(x)
2.0
>>> import itertools
>>> source = itertools.cycle([1, 10, 20, 10, 1])
>>> x = [source.next() for _ in range(101)]
>>> guess_period(x)
5.05
>>> x = [source.next() for _ in range(103)]
>>> guess_period(x)
4.904761904761905
>>> x = [source.next() for _ in range(105)]
>>> guess_period(x)
1.25  # fails

I think, we can maintain a dict ({period: hit_times}), the period that hit most wins.

astanway commented 10 years ago

Awesome. You can use Crucible (github.com/astanway/crucible) to refine the algorithm.

hit9 commented 10 years ago

Any progress forward on this ?

hit9 commented 10 years ago

Hi @astanway , I have created another monitor similar with to skyline https://github.com/eleme/node-bell , it's only for periodic metrics. And the algorithm used is only 3-sigma. Thanks, for this project giving me lot of ideas!