antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

Forecast dates are shifting #86

Closed Stanpol closed 6 years ago

Stanpol commented 6 years ago

I'm using the latest version from the github. Unfortunately the package doesn't have a convenient version variable to specify.

The following example illustrates the problem. My training dataset has a column with dates as the 1st day of month:

...
112   2014-05-01
113   2014-06-01
114   2014-07-01
115   2014-08-01
116   2014-09-01
117   2014-10-01
118   2014-11-01
119   2014-12-01

when doing a forecast, future dates are calculated not following the pattern of the 1st of month, but with a shift like every month has only 30 days:

120   2014-12-31
121   2015-01-30
122   2015-03-01
123   2015-03-31
124   2015-04-30
125   2015-05-30
126   2015-06-29
127   2015-07-29
128   2015-08-28
129   2015-09-27
130   2015-10-27
131   2015-11-26
132   2015-12-26
133   2016-01-25
134   2016-02-24
...

Could someone please point out where are the forecast dates calculated in the code?

antoinecarme commented 6 years ago

Hi @Stanpol ,

Thanks for this problem report. Interesting.

This is a known issue. The month periods are not regular (more like a human process ;). I will work on it.

I however cannot be sure that the pattern has been defined somewhere (unless an option is set). The current behavior is to maintain the same average period between consecutive dates which is the closest to the beginning of month one can have in an automated way. YMMV, new ideas are welcome.

If you want to look at the code, see nextDate() method in TS/Time.py

antoinecarme commented 6 years ago

Just curious, can you please copy-paste a plot of the forecasts ?

antoinecarme commented 6 years ago

By the way, the standard python timedelta object stops at weeks ;)

https://docs.python.org/3.6/library/datetime.html#datetime.timedelta

class datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]]]]])

antoinecarme commented 6 years ago

A lot of databases also have a SQL addMonths() function .... guess why ?

antoinecarme commented 6 years ago

numpy.timedelta does not support months. (too low-level)

FIX : use pandas.DateOffset , high-;level and supports all possible periods, even business , calendar and custom periods

see https://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects

antoinecarme commented 6 years ago

Example with ozone datset :

before :

image

After

image

antoinecarme commented 6 years ago

@Stanpol

Your feedback is welcome.

antoinecarme commented 6 years ago

Added release 1.0 fixing this issue.

Closing. Please, reopen if needed.