facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.36k stars 4.52k forks source link

How to Calculate & Plot the cross_validation Results? #483

Closed saadk408 closed 6 years ago

saadk408 commented 6 years ago

Can someone provide some example code to show how we can manipulate the "cross_validation" results to calculate the MAPE and plot it against other algorithms? Please see the attached screenshot from the Prophet paper:

image

bletham commented 6 years ago

Basically you just need to run it in a loop for a range of values of horizon, and then compute the MAPE for all of the results for each value of horizon. Here is some code that computes MAPE vs. horizon for the example time series in the documentation:

from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation
import pandas as pd
import numpy as np

df = pd.read_csv('../examples/example_wp_peyton_manning.csv')
df['y'] = np.log(df['y'])
m = Prophet()
m.fit(df)

# Compute cross-validation y and yhat for the range of horizons in the figure
df_cv = pd.DataFrame()
for h in [30, 60, 90, 120, 150, 180]:
    df_cv_h = cross_validation(m, horizon='{} days'.format(h), period='100 days', initial='730 days')
    df_cv_h['horizon'] = h
    df_cv = pd.concat((df_cv, df_cv_h))

# Compute absolute percent error for each prediction
df_cv['mape'] = np.abs((df_cv['y'] - df_cv['yhat']) / df_cv['y'])
# mean absolute percent error, by horizon
mape = df_cv.groupby('horizon', as_index=False).aggregate({'mape': 'mean'})

mape.head()
    horizon mape
0   30  0.049973
1   60  0.055482
2   90  0.057295
3   120 0.058952
4   150 0.060597
5   180 0.063632
bletham commented 6 years ago

The v0.3 branch has a utility for generating these plots (https://github.com/facebook/prophet/blob/v0.3/notebooks/diagnostics.ipynb) and will be pushed out soon. I'm going to close this issue and leave #194 as the issue for visualizing model diagnostics.

pankaj-kvhld commented 6 years ago

@bletham : why does cross_validation() function take model fit on the entire time series? Doesn't this lead to information leaking from the validation test to train?

bletham commented 6 years ago

It takes a fitted model just because that was the cleanest way to get all of the information needed to specify the model and do the cross validation. That includes things like the history, specified custom seasonalities or extra regressors. One thing to note is that for seasonalities, cross validation will use whatever seasonalities are used in the final model. Suppose that we have a five year history and yearly seasonality is set to 'auto'. With 5 years of history it will be turned on. Yearly seasonality will then be used for all of the cross validations, even any segments that have <2 years of data and so would typically have yearly seasonality turned off. Basically this makes sure the model features are fixed throughout cross validation.

This does raise a possibility of information leakage, but inside cross validation we use a newly instantiated model that just copies over the model fit settings, from here: https://github.com/facebook/prophet/blob/master/python/fbprophet/diagnostics.py#L126 . We've been careful there to be sure that nothing is being leaked.

haseebmahmud commented 5 years ago

@bletham The OP asked for a plot of cross validation results (MAPE in the example he posted) from prophet against the other algorithms. From v0.3 we have the possibility to plot the cross validation results of prophet. I am wondering if there is any plan to include the cross validation results from other algorithms/methodologies (e.g. ARIMA & co.) in future releases.

bletham commented 5 years ago

We don't intend to add additional methods to the package. It's really an implementation of this particular model and not designed as a platform for time series forecasting methods. (Something that would be valuable but not on our roadmap).