facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.14k stars 336 forks source link

Why Don't Model Results Show Impression Variables in One-Pager Results and Model Results Identical for Impression version and Spend version? #466

Closed GrizzledLotus closed 1 year ago

GrizzledLotus commented 2 years ago

Project Robyn

Describe issue

I would expect that if impression variables were declared in paid_media_vars that they would be represented in the waterfall chart and the effect spend/share/ROI chart. Yet when I run the demo.R code as is, only the spend variables show up in those results. Further, when I run the demo.R script and change the paid_media_vars to only the spend variables in paid_media_spends, 3 of the 5 model one-pagers are identical to the model one-pagers in the original demo file.

Why would this be? If the recommendation is to use impression and GRP variables, why don't they show up in the model and why are model results identical for models specified with impressions and only spend? Is this a bug or am I doing something wrong? I'm trying to persuade my company to use this and show the importance of using impression/GRP data, but I don't see a difference in the output between the two modeling approaches (impressions vs only spend).

When looking at the output in your Analyst's Guide (https://facebookexperimental.github.io/Robyn/docs/analysts-guide-to-MMM), it shows the two impression variables in the output graphs. I look forward to an explanation. Thanks.

Provide dummy data & model configuration

Original Demo File (only showing InputCollect and OutputModels

InputCollect <- robyn_inputs(
  dt_input = dt_simulated_weekly
  ,dt_holidays = dt_prophet_holidays
  ,date_var = "DATE" # date format must be "2020-01-01"
  ,dep_var = "revenue" # there should be only one dependent variable
  ,dep_var_type = "revenue" # "revenue" (ROI) or "conversion" (CPA)
  ,prophet_vars = c("trend", "season", "holiday") # "trend","season", "weekday" & "holiday"
  ,prophet_country = "DE"# input one country. dt_prophet_holidays includes 59 countries by default
  ,context_vars = c("competitor_sales_B", "events") # e.g. competitors, discount, unemployment etc
  ,paid_media_spends = c("tv_S","ooh_S",    "print_S"   ,"facebook_S", "search_S") # mandatory input
  ,paid_media_vars = c("tv_S", "ooh_S"  ,   "print_S"   ,"facebook_I" ,"search_clicks_P") # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  ,organic_vars = c("newsletter") # marketing activity without media spend
  ,factor_vars = c("events") # specify which variables in context_vars or organic_vars are factorial
  ,window_start = "2016-11-23"
  ,window_end = "2018-08-22"
  ,adstock = "geometric" # geometric, weibull_cdf or weibull_pdf.
)

OutputModels <- robyn_run(
  InputCollect = InputCollect # feed in all model specification
  #, cores = NULL # default
  #, add_penalty_factor = FALSE # Untested feature. Use with caution.
  , iterations = 2000 # recommended for the dummy dataset
  , trials = 5 # recommended for the dummy dataset
  , outputs = FALSE # outputs = FALSE disables direct model output
)

Modified Demo File with spend only variables

InputCollect_d2 <- robyn_inputs(
  dt_input = dt_simulated_weekly
  ,dt_holidays = dt_prophet_holidays
  ,date_var = "DATE" # date format must be "2020-01-01"
  ,dep_var = "revenue" # there should be only one dependent variable
  ,dep_var_type = "revenue" # "revenue" (ROI) or "conversion" (CPA)
  ,prophet_vars = c("trend", "season", "holiday") # "trend","season", "weekday" & "holiday"
  ,prophet_country = "DE"# input one country. dt_prophet_holidays includes 59 countries by default
  ,context_vars = c("competitor_sales_B", "events") # e.g. competitors, discount, unemployment etc
  ,paid_media_spends = c("tv_S","ooh_S",    "print_S"   ,"facebook_S", "search_S") # mandatory input
  ,paid_media_vars = c("tv_S", "ooh_S"  ,   "print_S"   ,"facebook_S" ,"search_S") # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  ,organic_vars = c("newsletter") # marketing activity without media spend
  ,factor_vars = c("events") # specify which variables in context_vars or organic_vars are factorial
  ,window_start = "2016-11-23"
  ,window_end = "2018-08-22"
  ,adstock = "geometric" # geometric, weibull_cdf or weibull_pdf.
)

OutputModels_d2 <- robyn_run(
  InputCollect = InputCollect_d2 # feed in all model specification
  #, cores = NULL # default
  #, add_penalty_factor = FALSE # Untested feature. Use with caution.
  , iterations = 2000 # recommended for the dummy dataset
  , trials = 5 # recommended for the dummy dataset
  , outputs = FALSE # outputs = FALSE disables direct model output
)

Environment & Robyn version

> packageVersion("Robyn")
[1] ‘3.7.1’
gufengzhou commented 2 years ago

Hi Philip,

Thanks for reaching out! We're having capacity constraints during the summer breaks so please bear with us.

Regarding the impressions, yes the latest versions will fit spend by default, while impressions are used only for indications purposes: when the pattern of imps and spend are very different, we recommend to split the channel to obtain more variability and thus better fit.

We used to fit using imps when provided. But it's deprecated and this is the explanation from a FB post:

"The major reason lies in the saturation curve and budget allocation. When using spend, we get cost-response relationship directly via hill function, which can be directly used in budget allocator. When using imps, we need to do extra fitting between imps and spend to recover the cost-response relationship. Considering the quite large uncertainty in imps-spend translation, it could results in very unreliable budget recommendation. Even though we're aware of the advantages of using exposure metrics, at least for now it's not worth the trade-off. We do consider re-introduce the spend-exposure fitting as added information though, however not for modelling purposes." https://m.alpha.facebook.com/groups/robynmmm/permalink/1234713643963433/?comment_id=1245438872890910&reply_comment_id=1245648736203257

I hope this makes sense to you.

gufengzhou commented 1 year ago

Please reopen if this issue reoccurs.