facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.16k stars 344 forks source link

Decomposition vs. Prediction #85

Closed wpro-ds closed 3 years ago

wpro-ds commented 3 years ago

Hi Robyn team,

Thanks for the great package ! I have been experimenting with it and seeing positive results. My question is somewhat related to #79 . For that issue resolution, you mentioned that Robyn is supposed to be used as a decomposition tool rather than a prediction tool. I think it would be useful to have some predictive functionality in the model. My questions are the following:

  1. How do we ensure that the model is reliable (i.e. validate the model) and that we can trust its recommendations ? In classic ML approaches, we answer this question based on prediction error on hold out data. In the absence of this predictive functionality in Robyn, what approaches do you recommend? P.S. - This is a critical issue to get buy-in when requesting increased budgets :)

  2. In #79 , you mentioned that it is controversial how to best provide future dataframe for intercept/trend/season/other baselines. Could you shed some light on that ? What are the issues ?

Again, thanks for this wonderful package and looking forward to future releases.

gufengzhou commented 3 years ago

Hi, thanks for trying out Robyn!

  1. For you information, we've removed the time series out-of-sample validation about a month ago. One important reason is that we want to build a new feature to enable MMM users to refresh the initial model using new data, a direct conflict to our previous OOS validation approach. As you know Robyn uses ridge regression, an approach that prevents overfitting by design. To be precise, we do have a 100-fold lambda cross validation for ridge regression. This is the major reason we're confident to go without time series OOS validation.
  2. For example, if you use Prophet for forecasting, you'll need to provide the future dataframe. While for some predictors (trend/season/weekday etc.) you can use the default predicted values from Prophet, for other predictors you need to make some strong assumptions. For example, if you have competitors as predictor, you'll have to somehow predict the future competition itself first. Weather as predictor is another example, which you'll need to forecast and is a topic for itself. Another example would be Covid, if you have it in the model: we all know it's not easy to predict Covid. That's why.

To summarise, Robyn's recommendation is based on your model choice in the end. If you ask how can you know if you've selected the "right" decay and saturation for your media, well the only way to know that is actually experimental calibration. In the spirit of "All models are wrong, some are useful", we believe only experiments can give you certainty. A model that is closer to experiment is therefore "more correct". Hope it makes sense.

wpro-ds commented 3 years ago

Thanks for the responses ! Appreciate it and look forward to the new features.