arundo / adtk

A Python toolkit for rule-based/unsupervised anomaly detection in time series
https://adtk.readthedocs.io
Mozilla Public License 2.0
1.06k stars 143 forks source link

Usage Date Format for ADTK #122

Open senemaktas opened 3 years ago

senemaktas commented 3 years ago

Hi,

I have multivariate time series data set which I want to use for binary classification with it. My data set has more than %90 -> 0 values, therefore I thought I can use ADTK. I am waiting output like PersistAD.

According to below data set part, the first column for "time". When I try to import my data set like that -> s = pd.read_csv('./data/price_long.csv', index_col="Time", parse_dates=True, squeeze=True) it gives error (TypeError: Index of time series must be a pandas DatetimeIndex object.) . I tried to convert Datetime but I got that error -> ValueError: time data '0' does not match format '%Y%m%d' (match) .

How can I solve this problem? Is it possible to use time as it is? Thanks.

time series1 series 2 series 3 0 0.708849 0.318052 159377.0 1 1 0.728374 0.305667 162063.0 0 2 0.728374 0.305667 162063.0 0 3 0.728374 0.305667 162063.0 0 4 0.728374 0.305667 162063.0 0

tailaiw commented 3 years ago

How did you convert the index? If you use something like df.index = pd.to_datetime(df.index), it should treat your integer index as nanosecond from epoch time.

adtk currently only supports pandas object with datetime index, because some models (e.g. seasonality ones) require it. We already realized it is not necessary for many other models and is somehow inconvenient, so we have #38 open.

senemaktas commented 3 years ago

Thank you very much for your response. Replace that s['time_motion'] = pd.to_datetime(s['time_motion'], format='%Y-%m-%d') -> this worked s.time_motion = pd.to_datetime(s.time_motion) and i got this output ->

0 1970-01-01 00:00:00.000000000 1 1970-01-01 00:00:00.000000001 2 1970-01-01 00:00:00.000000002 3 1970-01-01 00:00:00.000000003 4 1970-01-01 00:00:00.000000004

But after that when i try that from adtk.data import validate_series s = validate_series(s) it gives same thing. -> TypeError: Index of time series must be a pandas DatetimeIndex object.

The purpose of using this library is to give 3 time series and make beat predictions. In this case, can you recommend this library? I'm a beginner, it's my first time working with time series. Thank you so much.

x