facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.03k stars 4.48k forks source link

Price elasticity #598

Closed gkourogiorgas closed 5 years ago

gkourogiorgas commented 6 years ago

I use prophet to forecast sales and I add as a regressor the avg daily price. To improve the model i transformed sales and price to log. In marketing mix modelling, the solution of a regression with log(sales) and log(price) allows to interpret the coefficient of price as price elasticity. Is the same with prophet? I assume that regressors can be any continuous variables not necessarily dummies(which can be put in holidays anyway...) Thanks

bletham commented 6 years ago

Yes, the regressor can be continuous and that interpretation would be correct, although the regressor and outputs are both standardized, so you wouldn't be able to directly use the value of the fitted coefficient. You would want to get it from the forecast dataframe produced by predict (divide the column for the regressor by the regressor input to get the coefficient value).

gkourogiorgas commented 5 years ago

@bletham Thanks for your response! So, if i understand well the coefficient of each regressor varies over time. so to get the coefficient for each historic and future observation I must divide the predict() output of each regressor with the input. If I have log the input then after the division i must exp. Correct? I have log the y as well. Does this mean that trend and seasonality are also log? What about holiday? Is it treated like a regressor?

bletham commented 5 years ago

The coefficient does not vary over time. If the coefficient is beta and your regressor is x, then the column in the forecast dataframe will be beta * x. What I was trying to say (very unclearly) is that if you just pull the value of beta from the fitted model params (m.params), then it would not be what you want since both y and x are standardized prior to fitting. The output in the forecast dataframe is de-standardized, so you would be able to back out the correct value of beta by just taking that column and dividing by x. (Every value in the resulting column should have the same value).

If you have taken a log transform, then the trend will be piecewise linear in the log space, and so after you exponentiate it would be an exponentiated piecewise linear function - I'm not sure there is much more I can say about it than that. Additive seasonality in the log space will correspond to multiplicative seasonality in the real space.

If the reason the log transform performed better was because of multiplicative seasonality (as opposed to log-normal observation noise) then you can get multiplicative seasonality without having to do a log transform as described here: https://facebook.github.io/prophet/docs/multiplicative_seasonality.html . If you set your extra regressor to be multiplicative, then in the components plot it will show up as the % and can be interpreted as the % change in y that is attributed to the extra regressor by the model. This may be the sort of thing you're looking for with price elasticity.

gkourogiorgas commented 5 years ago

@bletham So in python I divided forecast['yhat'] / trainvalid['Avg RRP'] and I didn't get the same number across. They were close but not the same. I also ran the model with multiplicative seasonality in the real space along with multiplicative regressors (and without) but the MAPE was much bigger than the MAPE of the log space model. So the beta of the log price when sales is log can be interpreted as elasticity. Does this change when I add along with log regressors a regressor in the real space? Or all regressors should be in the log space?

bletham commented 5 years ago

If the name of your added regressor is 'Avg RRP', then you would do forecast['Avg RRP'] / trainvalid['Avg RRP']. That should be a constant number.

This gives a reasonable explanation of why the regression coefficient is price elasticity in a log linear model: https://support.sas.com/rnd/app/ets/examples/simpelast/index.htm Prophet isn't exactly this model, but you can imagine a there holding all of the seasonal and other terms, which do not depend on P.

Should regressor be logged or not: It depends on how you think y depends on the regressor. It is the difference of

log(y) = beta * log(x)

vs.

log(y) = beta * x

Which is more appropriate would depend, perhaps see which is a better fit for your data.

gkourogiorgas commented 5 years ago

@bletham I did the division you said and still didn't get a constant. Anyway,here is the code:

# -*- coding: utf-8 -*-
"""
Investigate whether prophet is appropriate for demand forecasting
"""
import pandas as pd
import numpy as np
from fbprophet import Prophet
#Get data
holidays = pd.read_csv('calendar 2015-2020.csv')
df_bkup=pd.read_csv('demand forecasting 20150101-20180626.csv')
df = pd.read_csv('demand forecasting 20150101-20180626.csv')
df.rename(columns ={'Transaction_Date': 'ds','Volume Demand': 'y'}, inplace =True)
df['ds'] = pd.to_datetime(df['ds'], format="%d/%m/%Y")
df['Avg Discount'] = df['Avg Discount'].str.rstrip('%').astype('float') / 100.0
df['y']=np.log(df['y'])
df['Impressions']=np.log(df['Impressions'])
df['Avg RRP']=np.log(df['Avg RRP'])
#Train-Validation-Test set
train_perc=0.6
valid_perc=0.2
train=df.iloc[0:round(len(df)*train_perc),:]
valid=df.iloc[round(len(df)*train_perc):round(len(df)*(train_perc+valid_perc)),:]
test=df.iloc[round(len(df)*(train_perc+valid_perc)):(len(df)+1),:]
#Grid
import itertools
def expand_grid(data_dict):
  rows = itertools.product(*data_dict.values())
  return pd.DataFrame.from_records(rows, columns=data_dict.keys())

params=expand_grid({'changepoint_prior_scale' : [100,10,1,0.1,0.05, 0.5, 0.001],
                           'seasonality_prior_scale' : [100,10,1,0.1,0.05, 0.5, 0.001],
                           'holidays_prior_scale' : [100,10,1,0.1,0.05, 0.5, 0.001]                           
                           })
#Prophet
results=pd.DataFrame(columns=['MAPE'])
for i in range(0,(len(params))):
    m = Prophet(changepoint_prior_scale=params.loc[i,'changepoint_prior_scale'],
                holidays=holidays,
                holidays_prior_scale=params.loc[i,'holidays_prior_scale'],
                seasonality_prior_scale=params.loc[i,'seasonality_prior_scale'])
    m.add_regressor('Impressions', mode='additive')
    m.add_regressor('Avg RRP', mode='additive')
    m.add_regressor('Avg Discount', mode='additive')
    m.fit(train)
    future = m.make_future_dataframe(periods=len(valid))
    trainvalid=pd.concat([train,valid])
    future=pd.concat([future, trainvalid[['Impressions', 'Avg RRP', 'Avg Discount']]],axis=1)
    forecast = m.predict(future)
    results.loc[i,'MAPE']=100*np.mean(abs(forecast.loc[forecast['ds'].isin(valid['ds']),'yhat']-valid['y'])/valid['y'])
#Find best
params = pd.concat([params, results],axis=1)
best_params = params.loc[params['MAPE'] == min(results['MAPE']), ]
best_params  = best_params.reset_index(drop=True)

I read the article on price elasticity and I am on board. So the model in my code should be: log(Volume Demand)(t)=trend(t)+weekly(t)+yearly(t)+b1log(Impressions)(t)+b2log(Avg RRP)(t)+ +b3Avg Discount(t)+c1hol1(t)+...+cn*holn(t) These terms I can find in the forecast dataframe with upper and lower values. Assuming the model above is actually run in the code above, I can find these components in the forecast df. The values in each column are the destandardized multiplication of the standardized value times the respective coefficient in m.params? If so, in the forecast dataframe there are other columns like additive terms, extra regressors additive, multiplicative terms and holidays apart from the aforementioned components in the model. What are these? thanks in advance for the help!

bletham commented 5 years ago

Ah, the reason it isn't constant is because by default non-binary regressors are standardized, so you'd need to either divide by a standardized trainvalid['Avg RRP'] (meaning subtract mean and divide by SD), or just turn of standardization for this extra regressors:

m.add_regressor('Avg RRP', standardize=False)

Then it will be constant.

The components in the forecast dataframe need to be better described in the documentation. holidays: Sum of all effects from things specified as holidays. extra_regressors_{additive/multiplicative}: Sum of all extra regressor effects that were specified as additive or multiplicative respectively. {additive/multiplicative}_terms: Sum of all additive or multiplicative effects. For instance if all of the seasonalities and holidays and regressors are set to additive, this would be the total effect (holidays + seasonality + regressors). If some are set to additive and others multiplicative, then these could both be non-zero.

gkourogiorgas commented 5 years ago

@bletham Thanks for your help so far! So to make my life easier I ran the code with standardize=False. The MAPE for my test set went from 1.8% to 2.04% which is sth I can live with (BTW the predictive power of prophet is beyond this world! You have done a great job!). The beta of (log)Avg RRP is now 0.014829. This means that price elasticity is 0.01%. I find this value very low. I noticed that the beta of the first regressor in the code (also log) was higher while Avg RRP is 2nd. I swapped and the beta of Avg RRP (now first) was about the same. Do you think it has to do with my data or does prophet discount regressors' effect to the benefit of trend and seasonality?

bletham commented 5 years ago

So the assumption in the log-linear demand model is that the other terms (a in that page) are independent of price. It seems that is probably not the case, which is causing your price effect to leak out into the other terms. For instance if there are seasonal changes in the price, then that would induce correlation between the seasonal components and the price component and the model could use seasonality to explain some of the price effect. Price effects could potentially also get mixed up with the trend. By default the trend is given much stronger regularization (smaller prior_scale) than the regressors so the model should prefer to use the regressor for any trend changes that it can explain, but if there isn't exactly a linear relationship between log(demand) and log(price), then I can imagine some of the price effect leaking into the trend.

You can probably get more insight into this by looking at the components plot, m.plot_components(fcst). You will see a panel in that plot for (log)Avg RRP. The y-axis values for that panel are the amount of the forecast yhat that are being explained by (log)Avg RRP. Under the log-linear demand model, log(y) should be a linear transformation of (log)Avg RRP, and you should be able to see how close that is to correct. If they do look similar but the model just isn't using it, then maybe there is something that could be done by adjusting priors. But my guess would be that the log-linear demand model assumptions are not satisfied enough for the price elasticity formula to work here. I'm not sure what else I could have to say about it then that unfortunately :-/

gkourogiorgas commented 5 years ago

Hi @bletham ! Again many thanks for your clarifications. There is sth else I would like to ask you. How to improve predictive accuracy? I use MAPE as my main metric because it is easier for management to understand. I ran prophet with all data in real space, log space, additive, multiplicative and all or one regressor. The best MAPE I could get is 25% with all data in the real space, additive model and all available regressors without standardization. I compared this against CatBoost by yandex, XGBoost and Random Forests. The trees gave me 18%,19% and 20% respectively using grid search to find the best parameters. However, I really like the explanatory power of prophet which is sth trees cannot do as easily (if not at all) and all the additional information it provides. That is why I am asking for some advice on how to improve my results. Any ideas?

bletham commented 5 years ago

I think to improve the performance you'd want to see if there are features of the time series that are not being well captured by the model when you plot the forecast. There's an ongoing example of this in #616 that might be useful.