facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.12k stars 331 forks source link

0% effect from organic variables #773

Closed LMiddles closed 4 months ago

LMiddles commented 1 year ago

Project Robyn

Describe issue

I have run the Robyn code for 2 clients, the first returning a small but positive decomp % for organic and the second returning 0% which makes no sense for the client.

We added in an extra organic variable for client 2 (added emails where previously we only had organic traffic to site) and both the Email and Organic Variables returned a 0% effect. For organic and email, the immediate vs carryover response percentage was blank also.

Provide reproducible example

InputCollect <- robyn_inputs( dt_input = dt_simulated_weekly, dt_holidays = dt_prophet_holidays, date_var = "Date", # date format must be "2020-01-01" dep_var = "revenue", # there should be only one dependent variable dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA) prophet_vars = c("trend", "season","weekday", "holiday"), # "trend","season", "weekday" & "holiday" prophet_signs = c("default", "positive", "default", "default"), prophet_country = "UK", # input one country. dt_prophet_holidays includes 59 countries by default context_vars = c("Promo_Start", "Inflation", "Covid_R_level", "Average_temp", "rain" ), # e.g. competitors, discount, unemployment etc paid_media_spends = c("Search_b_S", "Search_sh_S", "Search_g_S", "Social_p_S", "Social_r_S"), # mandatory input paid_media_vars = c("Search_b_C", "Search_sh_C", "Search_g_C", "Social_p_I", "Social_r_I"), # mandatory.

paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like

impressions, GRP etc. If not applicable, use spend instead.

organic_vars = c("Organic_I", "Email"), # marketing activity without media spend

factor_vars = c("events"), # force variables in context_vars or organic_vars to be categorical

window_start = "2020-10-28", window_end = "2023-03-31", adstock = "weibull_pdf" # geometric, weibull_cdf or weibull_pdf. )

weibull pdf

set_hyperBoundLocal <- list(

Search_b_S_alphas = c(0.5, 3), Search_b_S_gammas = c(0.1, 1), Search_b_S_shapes = c(0.001, 10), Search_b_S_scales = c(0, 0.1),

Search_g_S_alphas = c(0.5, 3), Search_g_S_gammas = c(0.1, 1), Search_g_S_shapes = c(0.001, 10), Search_g_S_scales = c(0, 0.1),

Search_sh_S_alphas = c(0.5, 3), Search_sh_S_gammas = c(0.1, 1), Search_sh_S_shapes = c(0.001, 10), Search_sh_S_scales = c(0, 0.1),

Social_p_S_alphas = c(0.5, 3), Social_p_S_gammas = c(0.1, 1), Social_p_S_shapes = c(0.001, 10), Social_p_S_scales = c(0, 0.1),

Social_r_S_alphas = c(0.5, 3), Social_r_S_gammas = c(0.1, 1), Social_r_S_shapes = c(0.001, 10), Social_r_S_scales = c(0, 0.1),

Organic_I_alphas = c(0.5, 3), Organic_I_gammas = c(0.1, 1), Organic_I_shapes = c(0.001, 10), Organic_I_scales = c(0, 0.1),

Email_alphas = c(0.5, 3), Email_gammas = c(0.1, 1), Email_shapes = c(0.001, 10), Email_scales = c(0, 0.1), train_size = c(0.5, 0.8) )

Environment & Robyn version

Make sure you're using the latest Robyn version before you post an issue.

AdimDrewnik commented 1 year ago

Robyn results seem to be very sensitive. Top models often are wildly different from each other. For example try removing from the demo data set competitor spend. Suddenly trend, which is almost nonexistent, gets majority of attribution. Try removing trend, then intercept gets almost all attribution which makes no sense as resulting models have very high R2. I don't understand how a model can have above 90% R2 and at the same time intercept gets more than 80% attribution. What baffles me is that the demo data set is not even converging on the default settings, even increasing trials and iterations significantly don't lead to the convergence.

gufengzhou commented 1 year ago

Hi @AdimDrewnik , well the % from R2 and the % from decomposition are two very different things. The first one is variance explained, the latter is just share of total effect. This is esp. possible for variables like season that's centered around 0: It could explain a lot if the dependent variable is very seasonal obviously, but the decomp (or attribution as in your words) is closed to 0. This topic is called "reference point" for these types of variables and categorical variables, too. So yes, it's possible to have 90% Rsq but 80% intercept. I'm not saying it's ideal though.

Regarding the phenomenon you mentioned that removing variables will lead to change of decomposition. This is not unique to Robyn. Use any regression model, when you remove or add another "significant" variable, result will change, esp. when multicolinearity is in place, which is almost always the case for MMM.

Reg. convergence for both objective functions, we usually recommend to at least have the prediction error converged (NRMSE), which is the case in demo data. For the business error (DECOMP.RSSD), it's not necessary. You can / should still explore the result.

Lastly, @LMiddles , I suggest you might consider removing "organic traffic", because we all know how messy "organic traffic", or the "direct/none" bucket in allocation is. Esp. if there's suspect that it cannibalises with paid channels. Otherwise, I recommend you to set add_penalty_factor = TRUE in robyn_run and rerun, as well as explore more results beyond those PNGs. You get all pareto results in pareto_aggregated.csv

AdimDrewnik commented 1 year ago

What add_penalty_factor = TRUE is doing? I was unable to find any explanations on the Internet and in the documentation. Can you point me to some materials describing the calculations of "share of total effect" for waterfall graph?

gufengzhou commented 1 year ago

You can find a brief documentation by running ?robyn_run. Moreover, it used the penalty.factor parameter from glmnet. Google that and you'll find more details