facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.47k stars 4.53k forks source link

Differences between using first day of month vs. end of month #2152

Open berndheite opened 2 years ago

berndheite commented 2 years ago

I work with monthly values and have noticed that a model with the same parameters (=default yearly seasonality) predicts completely different values when changing "ds" to the first day or the last day of month. Thereby the fitted values are the same for both models. I assume this is due to the underlying model, which is continuous-time as mentioned in the documentation (see https://facebook.github.io/prophet/docs/non-daily_data.html#monthly-data).

However, in some cases the prediction results differ greatly and more than I had expected. I could reduce this effect by adding yearly seasonality as regressors. But apart from that, can it be an advantage or disadvantage to choose one of the variants (beginning or end of month)? It would be awesome if someone can explain to me why the differences are that big.

Code example for reproduction:

# create sample data
df = pd.DataFrame(data={'ds':['2021-01-01','2021-02-01','2021-03-01','2021-04-01','2021-05-01','2021-06-01',
                   '2021-07-01','2021-08-01','2021-09-01','2021-10-01','2021-11-01','2021-12-01'],
                       'y':[1,2,3,5,6,7,2,3,4,5,7,8]})

df['ds'] = pd.to_datetime(df['ds'])

### 1. model using first day of month

# initialize model
model1 = Prophet(yearly_seasonality = True,
                weekly_seasonality=False,
                daily_seasonality=False,
                seasonality_mode='additive',
                growth='linear')

# fit model
model1.fit(df, iter=1000) # reduce computing time

# create future dataframe for prediction
future = model1.make_future_dataframe(periods=1,
                                    freq = 'MS')

# predict
df_pred = model1.predict(future)

### 2. model using last day of month
# change ds to eom
df['ds'] = df['ds'] + pd.offsets.MonthEnd(0)

# initialize model
model2 = Prophet(yearly_seasonality = True,
                weekly_seasonality=False,
                daily_seasonality=False,
                seasonality_mode='additive',
                growth='linear')

# fit model
model2.fit(df, iter=1000) # reduce computing time

# create future dataframe for prediction
future = model2.make_future_dataframe(periods=1,
                                    freq = 'M')

# predict
df_pred_eom = model2.predict(future)

### 3. compare results
display(df_pred - df_pred_eom)
priamai commented 5 months ago

Interesting, I would think it shouldn't make any difference, but I never tried. I am going to use your code example to see how much big is the delta.