antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 73 forks source link

Support Date types #4

Closed antoinecarme closed 7 years ago

antoinecarme commented 7 years ago

antoine@z600:~/dev/python/packages/pyaf$ ipython3 tests/bench/test_yahoo.py ACQUIRED_YAHOO_LINKS 4818 YAHOO_DATA_LINK AAPL https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_AAPL.csv YAHOO_DATA_LINK GOOG https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_GOOG.csv load_yahoo_stock_prices my_test 2 BENCH_TYPE YAHOO_my_test OneDataFramePerSignal BENCH_DATA YAHOO_my_test <pyaf.Bench.TS_datasets.cTimeSeriesDatasetSpec object at 0x7fdc9c5f7fd0> TIME : Date N= 1246 H= 12 HEAD= ['2011-07-28T00:00:00.000000000' '2011-07-29T00:00:00.000000000' '2011-08-01T00:00:00.000000000' '2011-08-02T00:00:00.000000000' '2011-08-03T00:00:00.000000000'] TAIL= ['2016-07-05T00:00:00.000000000' '2016-07-06T00:00:00.000000000' '2016-07-07T00:00:00.000000000' '2016-07-08T00:00:00.000000000' '2016-07-11T00:00:00.000000000'] SIGNAL : GOOG N= 1246 H= 12 HEAD= [ 610.941019 603.691033 606.771021 592.40099 601.171059] TAIL= [ 694.950012 697.77002 695.359985 705.630005 715.090027] GOOG Date 0 610.941019 2011-07-28 1 603.691033 2011-07-29 2 606.771021 2011-08-01 3 592.400990 2011-08-02 4 601.171059 2011-08-03

antoinecarme commented 7 years ago

<class 'pandas.core.frame.DataFrame'> Int64Index: 1246 entries, 0 to 1245 Data columns (total 2 columns): GOOG 1246 non-null float64 Date 1246 non-null datetime64[ns] dtypes: datetime64ns, float64(1) memory usage: 29.2 KB

antoinecarme commented 7 years ago

=================> Date 1246 non-null datetime64[ns]

antoinecarme commented 7 years ago

Mainly a presentation issue (datetime is OK). Setting priority to low.

antoinecarme commented 7 years ago

pandas does not support date internally (only datetime64 in pandas.Series).

antoinecarme commented 7 years ago

need to adapt the timedelta (used to compute future dates) to keep the date format according to the detected time resolution.

before (timedelta = 1 days 10:54:24.864864 , non-regular dates, signal only on business days)

INFO:pyaf.std:TIME_DETAIL TimeVariable='Date' TimeMin=2011-07-28T00:00:00.000000 TimeMax=2015-07-20T00:00:00.000000 TimeDelta=1 days 10:54:24.864864 Estimation = (0 , 1000) Validation = (1000 , 1251) Test = (1251 , 1258) Horizon=7

Forecasts
                               Date       Close  Close_Forecast
1251 2016-07-19 00:00:00.000000000   99.870003       99.830002
1252 2016-07-20 00:00:00.000000000   99.959999       99.870003
1253 2016-07-21 00:00:00.000000000   99.430000       99.959999
1254 2016-07-22 00:00:00.000000000   98.660004       99.430000
1255 2016-07-25 00:00:00.000000000   97.339996       98.660004
1256 2016-07-26 00:00:00.000000000   96.669998       97.339996
1257 2016-07-27 00:00:00.000000000  102.949997       96.669998
1258 2016-07-28 10:54:24.864864864         NaN      102.949997
1259 2016-07-29 21:48:49.729729728         NaN      102.949997
1260 2016-07-31 08:43:14.594594592         NaN      102.949997
1261 2016-08-01 19:37:39.459459456         NaN      102.949997
1262 2016-08-03 06:32:04.324324320         NaN      102.949997
1263 2016-08-04 17:26:29.189189184         NaN      102.949997
1264 2016-08-06 04:20:54.054054048         NaN      102.949997

After (timedelta = 1 days )

INFO:pyaf.std:TIME_DETAIL TimeVariable='Date' TimeMin=2011-07-28T00:00:00.000000 TimeMax=2015-07-20T00:00:00.000000 TimeDelta=1 days Estimation = (0 , 1000) Validation = (1000 , 1251) Test = (1251 , 1258) Horizon=7


Forecasts
            Date       Close  Close_Forecast
1251 2016-07-19   99.870003       99.830002
1252 2016-07-20   99.959999       99.870003
1253 2016-07-21   99.430000       99.959999
1254 2016-07-22   98.660004       99.430000
1255 2016-07-25   97.339996       98.660004
1256 2016-07-26   96.669998       97.339996
1257 2016-07-27  102.949997       96.669998
1258 2016-07-28         NaN      102.949997
1259 2016-07-29         NaN      102.949997
1260 2016-07-30         NaN      102.949997
1261 2016-07-31         NaN      102.949997
1262 2016-08-01         NaN      102.949997
1263 2016-08-02         NaN      102.949997
1264 2016-08-03         NaN      102.949997

This is valid even when the detected resolution is in minutes.