Closed antoinecarme closed 7 years ago
<class 'pandas.core.frame.DataFrame'> Int64Index: 1246 entries, 0 to 1245 Data columns (total 2 columns): GOOG 1246 non-null float64 Date 1246 non-null datetime64[ns] dtypes: datetime64ns, float64(1) memory usage: 29.2 KB
=================> Date 1246 non-null datetime64[ns]
Mainly a presentation issue (datetime is OK). Setting priority to low.
pandas does not support date internally (only datetime64 in pandas.Series).
need to adapt the timedelta (used to compute future dates) to keep the date format according to the detected time resolution.
before (timedelta = 1 days 10:54:24.864864 , non-regular dates, signal only on business days)
INFO:pyaf.std:TIME_DETAIL TimeVariable='Date' TimeMin=2011-07-28T00:00:00.000000 TimeMax=2015-07-20T00:00:00.000000 TimeDelta=1 days 10:54:24.864864 Estimation = (0 , 1000) Validation = (1000 , 1251) Test = (1251 , 1258) Horizon=7
Forecasts
Date Close Close_Forecast
1251 2016-07-19 00:00:00.000000000 99.870003 99.830002
1252 2016-07-20 00:00:00.000000000 99.959999 99.870003
1253 2016-07-21 00:00:00.000000000 99.430000 99.959999
1254 2016-07-22 00:00:00.000000000 98.660004 99.430000
1255 2016-07-25 00:00:00.000000000 97.339996 98.660004
1256 2016-07-26 00:00:00.000000000 96.669998 97.339996
1257 2016-07-27 00:00:00.000000000 102.949997 96.669998
1258 2016-07-28 10:54:24.864864864 NaN 102.949997
1259 2016-07-29 21:48:49.729729728 NaN 102.949997
1260 2016-07-31 08:43:14.594594592 NaN 102.949997
1261 2016-08-01 19:37:39.459459456 NaN 102.949997
1262 2016-08-03 06:32:04.324324320 NaN 102.949997
1263 2016-08-04 17:26:29.189189184 NaN 102.949997
1264 2016-08-06 04:20:54.054054048 NaN 102.949997
After (timedelta = 1 days )
INFO:pyaf.std:TIME_DETAIL TimeVariable='Date' TimeMin=2011-07-28T00:00:00.000000 TimeMax=2015-07-20T00:00:00.000000 TimeDelta=1 days Estimation = (0 , 1000) Validation = (1000 , 1251) Test = (1251 , 1258) Horizon=7
Forecasts
Date Close Close_Forecast
1251 2016-07-19 99.870003 99.830002
1252 2016-07-20 99.959999 99.870003
1253 2016-07-21 99.430000 99.959999
1254 2016-07-22 98.660004 99.430000
1255 2016-07-25 97.339996 98.660004
1256 2016-07-26 96.669998 97.339996
1257 2016-07-27 102.949997 96.669998
1258 2016-07-28 NaN 102.949997
1259 2016-07-29 NaN 102.949997
1260 2016-07-30 NaN 102.949997
1261 2016-07-31 NaN 102.949997
1262 2016-08-01 NaN 102.949997
1263 2016-08-02 NaN 102.949997
1264 2016-08-03 NaN 102.949997
This is valid even when the detected resolution is in minutes.
antoine@z600:~/dev/python/packages/pyaf$ ipython3 tests/bench/test_yahoo.py ACQUIRED_YAHOO_LINKS 4818 YAHOO_DATA_LINK AAPL https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_AAPL.csv YAHOO_DATA_LINK GOOG https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_GOOG.csv load_yahoo_stock_prices my_test 2 BENCH_TYPE YAHOO_my_test OneDataFramePerSignal BENCH_DATA YAHOO_my_test <pyaf.Bench.TS_datasets.cTimeSeriesDatasetSpec object at 0x7fdc9c5f7fd0> TIME : Date N= 1246 H= 12 HEAD= ['2011-07-28T00:00:00.000000000' '2011-07-29T00:00:00.000000000' '2011-08-01T00:00:00.000000000' '2011-08-02T00:00:00.000000000' '2011-08-03T00:00:00.000000000'] TAIL= ['2016-07-05T00:00:00.000000000' '2016-07-06T00:00:00.000000000' '2016-07-07T00:00:00.000000000' '2016-07-08T00:00:00.000000000' '2016-07-11T00:00:00.000000000'] SIGNAL : GOOG N= 1246 H= 12 HEAD= [ 610.941019 603.691033 606.771021 592.40099 601.171059] TAIL= [ 694.950012 697.77002 695.359985 705.630005 715.090027] GOOG Date 0 610.941019 2011-07-28 1 603.691033 2011-07-29 2 606.771021 2011-08-01 3 592.400990 2011-08-02 4 601.171059 2011-08-03