facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.08k stars 322 forks source link

Unable to get decent R2 in Test due to new collection launch for the brand which resulted in step increase in the daily sales #898

Closed CJ2407 closed 4 months ago

CJ2407 commented 5 months ago

Project Robyn

Describe issue

My brand has launched a new collection in Mar 2023 which has resulted in a step increase in the average daily sales. I am using 972 days of data in this modeling (Feb 2021 - Sep 2023). To account for this change, I have added context variable called "BrandMoment" which I have set to 1 from 3/4/2023 onwards. I tried numerical values as well in this variable trying to give gradual increase in the impact of this brand moment. I have also given interaction context var = having a product of trend variable with simply categorical brand moment variable. But none of these way I am getting this new trend captured in the prediction.

Could you please assist on how to proceed with this to get better R2 in the Test and better NRMSE for test? So, far I have not tuned any hyperparameters for any channel and they are set to recommended values. The channel that was promoting this launch is also generating 0 coefficient.

image

image

Attaching files to recreate this example on your side - Daily_Data_M2_v1 - github.csv CodeforGithub.txt

CJ2407 commented 5 months ago

@laresbernardo @gufengzhou Sorry for tagging you guys specifically, but I could really use your inputs here! Thank you so much in advance :)

gufengzhou commented 5 months ago

I suggest you to turn off ts_validation. Because you're not really forecasting and thus test stats noch really necessary.

In general it's very difficult to get good test R2 because of the step change that's completely outside of the training data. The model has no chance to "learn" about the effect so to say.

CJ2407 commented 5 months ago

Thanks @gufengzhou. Your input is very helpful. A few more things to clarify on the same note -

  1. Since we don't yet have forecasting capabilities within Robyn, should we always turn off ts_validation?
  2. If yes for above, then how do we know that model is not biased or overfitting?
  3. Is there an ETA on the forecasting capabilities?
  4. As you can see we have huge impact on our KPI due to holidays and promotions around that time, is there an easy way to bake that in the model? Currently, I keep adding PROMO dummy flags in the model and it's quite a manual step.
gufengzhou commented 5 months ago

I'd say it's ok to disable ts validation without forecasting. MMM's main purpose is inference. Robyn already use ridge regression to reduce overfitting, also the second objective function decomp.rssd prevents the optimisation from overfitting to the training set.

For your promotion, if you spend extra on these periods, it should be reflected as media spend. If could also flag the holiday data frame to include some of these effects into holiday if you don't want them separately.

gufengzhou commented 4 months ago

please reopen if necessary