datamllab / tods

TODS: An Automated Time-series Outlier Detection System
http://tods-doc.github.io
Apache License 2.0
1.41k stars 192 forks source link

PyOD for point-wise detection #39

Open ogreyesp opened 3 years ago

ogreyesp commented 3 years ago

Hi

I'm really interested in using TODS for detecting outliers in multivariate timeseries data. However, I'm missing something. According to the official TODS's documentation:

"Wide-range of Algorithms, including all of the point-wise detection algorithms supported by PyOD, state-of-the-art pattern-wise (collective) detection algorithms such as DeepLog, Telemanon, and also various ensemble algorithms for performing system-wise detection."

So, TODS currently uses PyOD to perform point-wise detection in time series data. However, as it is indicated here, PyOD doesn't handle time series data. So, my question is: How does TODS adapt PyOD for performing point-wise detection in time series data correctly?

Best regards

yzhao062 commented 3 years ago

Hi @ogreyesp , PyOD is designed for point cloud as I mentioned in the issue you mentioned. However, once the data is converted into that format from time series, then it is fully applicable. So TODS provides the feature transformation to digest the TS datasets, so that we could use PyOD's algorithms on top of it.

Hope this helps.

ogreyesp commented 3 years ago

Hi @yzhao062

Thank for your response. So, you are converting a timeseries into a cloud of points that are statistically independent. This solution is acceptable, but methodologically I'm not sure if is it correct 100%. In this case, when selecting the i-th point as an outlier, you are missing the correlation between this point and its previous and next timesteps.

lhenry15 commented 3 years ago

Hi @ogreyesp In TODS we have various primitives helping users to extract appropriate features/contexts to address there own need. For example, as you say if we want to model the corelattion between current point and next timestemps, we can use subsequence segmentation to extract the contextual information for each time point and construct the point cloud. Or, another way is to apply alternative algorithms that models temporal correlations within the data directly such as autoregression.

ogreyesp commented 3 years ago

Hi @lhenry15

Tx!. Very interesting your point. Do you have an example of using the primitive "Subsequence Segmentation" with TODS?

lhenry15 commented 3 years ago

Hi @ogreyesp ,

You might want to take a look into benchmark branch, which has some examples to build detection pipelines with subsequence segmentation. We are still working on presenting more examples with IPython notebook, will have more examples coming up later.