facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.49k stars 4.53k forks source link

Understanding of Seasonality #1025

Closed akbaramed closed 5 years ago

akbaramed commented 5 years ago

Hello,

Need your approval on the approach taken by me. Please find file attached with ds and y column. rows = 165. looking at a period of 6 months

Questions.

  1. Assume prophet takes care of missing data in terms of some dates missing in the time series or should I create a continuous time series by adding missing dates and imputing with min value of 'y' for the newly added rows.

  2. How can I decide which seasonality to take. additive or multiplicative ? Does prophet handle seasonality with the help of Fourier order

  3. Do I need to make the 'y' stationary to use in prophet
    ==> I tired looking at seasonal_decompose. for the data uploaded don't get much of a difference. used the below code for it ############################## from statsmodels.tsa.seasonal import seasonal_decompose df_fc['ds'] = pd.to_datetime(df_fc.ds) df_fc = df_fc.set_index('ds')

has problem with missing timeseries data

idx = pd.period_range(min(df_fc.index), max(df_fc.index)) df_fc = df_fc.reindex(idx.to_timestamp(), fill_value= df_fc.y.describe()[3])## replace with min value

seasonal_decompose(df_fc['y'], model='additive').plot() seasonal_decompose(df_fc['y'], model='multiplicative').plot() ##############################

==> and statsmodels.tsa.stattools.adfuller , changed the value to log to make it stationary.

  1. How to decide what Fourier order to take, is there some formula or rule of thumb that can be applied, can it be decided by looking at the initial output of the model ?

  2. With the below code I get decent Rsq of 0.80 and Mae of 1.019, apprehensive about over fitting by using very high Fourier order.
    ################## Code for prophet ###################

import fbprophet as fb from sklearn.metrics import r2_score, mean_absolute_error ''' model=fb.Prophet(growth='linear', changepoint_prior_scale= 1.75)# 1.75 #0.75 model.add_seasonality(name='weekly', period=7, fourier_order=12)# 5 ; 10 model.add_seasonality(name='monthly', period=30.5, fourier_order=25)# 7 , 8 ; 20 ''' model=fb.Prophet(growth='linear', changepoint_prior_scale= 0.75, seasonality_mode='multiplicative', weekly_seasonality= False, daily_seasonality=False, yearly_seasonality=False) model.add_seasonality(name='weekly', period=7, fourier_order=12) model.add_seasonality(name='monthly', period=30.5, fourier_order=25)

model.fit(df_fc)
df_fc_pred = model.make_future_dataframe(periods= 30 , freq='D')

pred_fc = model.predict(df_fc_pred)

model.plot(pred_fc, xlabel = 'Date', ylabel = 'Sales')
model.plot_components(pred_fc)

print("Rsq:", r2_score((df_fc.y), (pred_fc.yhat[:df_fc.shape[0]])))
print("MAE:", mean_absolute_error((df_fc.y), (pred_fc.yhat[:df_fc.shape[0]])))

#####################################

forecast.txt

Apologies for asking so many questions. Big fan of the fbprophet algo and many thanks for looking into my problem.

Regards

daikonradish commented 5 years ago

Here are some thoughts from a user of this package, so your mileage may vary.

  1. Prophet is able to handle missing data. See here. Imputing a definitely incorrect value (like the min) is likely unhelpful and will probably lead to weirder results.
  2. Multiplicative vs additive depends on your domain.; this blog post has a good explanation of which could be better. Higher Fourier order reflects how more complexity in your seasonality the model can account for, but may lead to overfitting and thus may predict future results less well. What you choose can be done using cross validation. For a skeleton on how to do CV with sklearn, refer here.
  3. Stationarity is a property usually associated with ARIMA models, which Prophet does not use, and does not require as a property to make good future predictions. See here for a discussion.
  4. if you're worried about overfitting you could use a combination of random search and cross validation to choose the model that has the best predictive/informative power. See here for a discussion.
bletham commented 5 years ago

I just wanted to point out a couple of things in addition to @daikonradish's comments. The first is that

2: For multiplicative seasonality, the example given in the documentation here may also be helpful: https://facebook.github.io/prophet/docs/multiplicative_seasonality.html

4: You are measuring Rsq and MAE on the training data. This is highly likely to cause you to overfit. You need to use cross validation to measure MAE on a hold-out test set of future values. Prophet has a utility for doing this which you can read about here: https://facebook.github.io/prophet/docs/diagnostics.html. I also suspect that there is overfitting happening with those large Fourier orders. You should find that for daily data like this, there is no benefit to having a weekly seasonalty with Fourier order larger than 3. A monthly seasonality with 25 components also seems really high. The model will introduce 2 * fourier_order variables to describe the seasonality, so this is introducing 50 variables to describe a seasonality that only spans ~30 datapoints; that seems very likely to overfit.

akbaramed commented 5 years ago

Hello All, Thank you very much for your comments. I will try the above suggestions,

Just last question, still finding It hard to tell when to use multiplicative or additional seasonality. Can you please take a look at the dataset attached in the original post and let me know, if there is add or mul seasonality and how you concluded it. Is there some kind of test to follow which can make it black and white, in my case plotting the results is not making much of a difference, hence I am no able to find out.

Once again, many thanks for the valuable tips and suggestions.

bletham commented 5 years ago

If the trend doesn't change much, then additive and multiplicative seasonality are equivalent. If it isn't clear from looking at the forecast, then I think the best thing to do would be to use the cross validation tool to measure prediction error with multiplicative vs. additive seasonality and see if one is better than the other.

akbaramed commented 5 years ago

Thank you all, for your valuable suggestion. Ill keep them in mind when creating models.