arundo / adtk

A Python toolkit for rule-based/unsupervised anomaly detection in time series
https://adtk.readthedocs.io
Mozilla Public License 2.0
1.09k stars 145 forks source link

Mutivariate anomalie detection. #99

Open abhimanyu3-zz opened 4 years ago

abhimanyu3-zz commented 4 years ago

Is there any possibility if we have 5 data points in a data frame(d1,d2,d3,d4,d5) and if any data point is an anomaly then which data point caused the anomaly and if we can assign some score like d5 was responsible 50% d3 is 30% etc.

Thanks!

tailaiw commented 4 years ago

@abhimanyu3 If we assume a series is the reason for anomaly, then I would apply a univariate detector to each series independently.

A multivariate detector is for the case where the anomaly is due to the relationship between series changes. In that case, it's hard to say which series "causes" the anomaly because the anomaly is caused by those series jointly.

abhimanyu3-zz commented 4 years ago

@tailaiw Thanks a lot for your response. I have to find sudden peaks and drops in my multivariate time series data so I am using the PersistAD method on the df. Shall I use it on each column or even if I am using it on df it's the same thing?

Also, where I can find details like what is C in the PersistAD so that I can take a holistic decision in tuning.

Do you recommend any other method for finding sudden peaks and drop or persistAD is good.

ivanokeeffe commented 4 years ago

hi there, do either of you know where you can find the formulae used in the PersistAD? I haven't been able to find it in the code. Thanks a million guys

tailaiw commented 4 years ago

@abhimanyu3 and @ivanokeeffe PersistAD is implemented as a pipeline of DoubleRollingAggregate transformer and InterQuartileRangeAD detector. You may refer to the pipe_ attribute of a PersistAD object for more details.

The parameter c is the same one used by the internal InterQuartileRangeAD which controls the "normal range". InterQuartileRangeAD is a very classic simple outlier detection method. The value "c" is usually 1.5 or 3, although the user may specify according to the problem to solve.

ivanokeeffe commented 4 years ago

Thanks a million for replying to my question. Appreciate it greatly.

On Mon, 30 Mar 2020, 21:59 tailaiw, notifications@github.com wrote:

@abhimanyu3 https://github.com/abhimanyu3 and @ivanokeeffe https://github.com/ivanokeeffe PersistAD is implemented as a pipeline of DoubleRollingAggregate transformer and InterQuartileRangeAD detector. You may refer to the pipe_ attribute of a PersistAD object for more details.

The parameter c is the same one used by the internal InterQuartileRangeAD which controls the "normal range". InterQuartileRangeAD is a very classic simple outlier detection method. The value "c" is usually 1.5 or 3, although the user may specify according to the problem to solve.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arundo/adtk/issues/99#issuecomment-606246624, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANSICLCVW3GSOG5YTNM4KI3RKEB4FANCNFSM4LSKS5RA .

abhimanyu3-zz commented 4 years ago

@ivanokeeffe Hey! what kind of outlier you are trying to detect. Is it sudden peak and drops??

ivanokeeffe commented 4 years ago

Yep exactly, just trying to detect sudden drops actually. The PersistAD works perfectly but just trying to dig in to the maths behind it...

On Tue, 31 Mar 2020 at 03:21, Abhimanyu notifications@github.com wrote:

@ivanokeeffe https://github.com/ivanokeeffe Hey! what kind of outlier you are trying to detect. Is it sudden peak and drops??

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arundo/adtk/issues/99#issuecomment-606358530, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANSICLEA3P3CXBBRL5A66KDRKFHTLANCNFSM4LSKS5RA .

ivanokeeffe commented 4 years ago

Does anyone know how to get the intermediate output for the PersistAD also?

tailaiw commented 4 years ago

Running a pipe object (adtk.pipeline or adtk.pipenet) with option return_intermediate=True will return the results of all steps of the pipe, instead of only the last one.

As mentioned above, like many other models in ADTK, PersistAD is internally implemented as a pipe of transformers and detectors. Attribute pipe_ points to the internal pipe object. So if we want the intermediate results, the easiest way is probably calling it as follows:

my_model = PersistAD()
my_model.pipe_.fit_detect(s, return_intermediate=True) # instead of my_model.fit_detect(s) which is equivalent to my_model.pipe_.fit_detect(s, return_intermediate=False)
abhimanyu3-zz commented 4 years ago

@ivanokeeffe Hey! Are you also applying seasonality check in this. I mean by editing the pipeline?

ivanokeeffe commented 4 years ago

Thanks a million for replying to my questions. This has been super helpful!

On Wed, 1 Apr 2020 at 15:49, tailaiw notifications@github.com wrote:

Running a pipe object (adtk.pipeline or adtk.pipenet) with option return_intermediate=True will return the results of all steps of the pipe, instead of only the last one.

As mentioned above, like many other models in ADTK, PersistAD is internally implemented as a pipe of transformers and detectors. Attribute pipe_ is the internal pipe. So if we want the intermediate results, the easiest way is probably calling it as follows:

my_model = PersistAD() mymodel.pipe.fit_detect(s, return_intermediate=True) # instead of my_model.fit_detect(s) which is equivalent to mymodel.pipe.fit_detect(s, return_intermediate=False)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arundo/adtk/issues/99#issuecomment-607294580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANSICLDUWXP2EXYMNT7W24TRKNH7XANCNFSM4LSKS5RA .

ivanokeeffe commented 4 years ago

Hey, not at the moment but I guess that is something I could be doing too. Do you have any resources for explaining what seasonality is in time series? Thanks

On Fri, 3 Apr 2020 at 14:36, Abhimanyu notifications@github.com wrote:

@ivanokeeffe https://github.com/ivanokeeffe Hey! Are you also applying seasonality check in this. I mean by editing the pipeline?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arundo/adtk/issues/99#issuecomment-608437412, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANSICLGA6VI6DHQSKY66RV3RKXQ6BANCNFSM4LSKS5RA .

abhimanyu3-zz commented 4 years ago

Give it a read :- https://www.quora.com/How-do-you-identify-seasonality-in-a-time-series-data

Let me know if you will be able to do it. @ivanokeeffe

Did you get the maths behind the persistAD?