LMiddles commented 1 year ago

Project Robyn

Describe issue

I have run the Robyn code for 2 clients, the first returning a small but positive decomp % for organic and the second returning 0% which makes no sense for the client.

We added in an extra organic variable for client 2 (added emails where previously we only had organic traffic to site) and both the Email and Organic Variables returned a 0% effect. For organic and email, the immediate vs carryover response percentage was blank also.

Provide reproducible example

InputCollect <- robyn_inputs( dt_input = dt_simulated_weekly, dt_holidays = dt_prophet_holidays, date_var = "Date", # date format must be "2020-01-01" dep_var = "revenue", # there should be only one dependent variable dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA) prophet_vars = c("trend", "season","weekday", "holiday"), # "trend","season", "weekday" & "holiday" prophet_signs = c("default", "positive", "default", "default"), prophet_country = "UK", # input one country. dt_prophet_holidays includes 59 countries by default context_vars = c("Promo_Start", "Inflation", "Covid_R_level", "Average_temp", "rain" ), # e.g. competitors, discount, unemployment etc paid_media_spends = c("Search_b_S", "Search_sh_S", "Search_g_S", "Social_p_S", "Social_r_S"), # mandatory input paid_media_vars = c("Search_b_C", "Search_sh_C", "Search_g_C", "Social_p_I", "Social_r_I"), # mandatory.

paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like

impressions, GRP etc. If not applicable, use spend instead.

organic_vars = c("Organic_I", "Email"), # marketing activity without media spend

factor_vars = c("events"), # force variables in context_vars or organic_vars to be categorical

window_start = "2020-10-28", window_end = "2023-03-31", adstock = "weibull_pdf" # geometric, weibull_cdf or weibull_pdf. )

weibull pdf

set_hyperBoundLocal <- list(

Search_b_S_alphas = c(0.5, 3), Search_b_S_gammas = c(0.1, 1), Search_b_S_shapes = c(0.001, 10), Search_b_S_scales = c(0, 0.1),

Search_g_S_alphas = c(0.5, 3), Search_g_S_gammas = c(0.1, 1), Search_g_S_shapes = c(0.001, 10), Search_g_S_scales = c(0, 0.1),

Search_sh_S_alphas = c(0.5, 3), Search_sh_S_gammas = c(0.1, 1), Search_sh_S_shapes = c(0.001, 10), Search_sh_S_scales = c(0, 0.1),

Social_p_S_alphas = c(0.5, 3), Social_p_S_gammas = c(0.1, 1), Social_p_S_shapes = c(0.001, 10), Social_p_S_scales = c(0, 0.1),

Social_r_S_alphas = c(0.5, 3), Social_r_S_gammas = c(0.1, 1), Social_r_S_shapes = c(0.001, 10), Social_r_S_scales = c(0, 0.1),

Organic_I_alphas = c(0.5, 3), Organic_I_gammas = c(0.1, 1), Organic_I_shapes = c(0.001, 10), Organic_I_scales = c(0, 0.1),

Email_alphas = c(0.5, 3), Email_gammas = c(0.1, 1), Email_shapes = c(0.001, 10), Email_scales = c(0, 0.1), train_size = c(0.5, 0.8) )

Environment & Robyn version

Make sure you're using the latest Robyn version before you post an issue.

Robyn version ‘3.10.3’
R version 4.2.1 (2022-06-23 ucrt) "Funny-Looking Kid"

AdimDrewnik commented 1 year ago

Robyn results seem to be very sensitive. Top models often are wildly different from each other. For example try removing from the demo data set competitor spend. Suddenly trend, which is almost nonexistent, gets majority of attribution. Try removing trend, then intercept gets almost all attribution which makes no sense as resulting models have very high R2. I don't understand how a model can have above 90% R2 and at the same time intercept gets more than 80% attribution. What baffles me is that the demo data set is not even converging on the default settings, even increasing trials and iterations significantly don't lead to the convergence.

gufengzhou commented 1 year ago

Hi @AdimDrewnik , well the % from R2 and the % from decomposition are two very different things. The first one is variance explained, the latter is just share of total effect. This is esp. possible for variables like season that's centered around 0: It could explain a lot if the dependent variable is very seasonal obviously, but the decomp (or attribution as in your words) is closed to 0. This topic is called "reference point" for these types of variables and categorical variables, too. So yes, it's possible to have 90% Rsq but 80% intercept. I'm not saying it's ideal though.

Regarding the phenomenon you mentioned that removing variables will lead to change of decomposition. This is not unique to Robyn. Use any regression model, when you remove or add another "significant" variable, result will change, esp. when multicolinearity is in place, which is almost always the case for MMM.

Reg. convergence for both objective functions, we usually recommend to at least have the prediction error converged (NRMSE), which is the case in demo data. For the business error (DECOMP.RSSD), it's not necessary. You can / should still explore the result.

Lastly, @LMiddles , I suggest you might consider removing "organic traffic", because we all know how messy "organic traffic", or the "direct/none" bucket in allocation is. Esp. if there's suspect that it cannibalises with paid channels. Otherwise, I recommend you to set add_penalty_factor = TRUE in robyn_run and rerun, as well as explore more results beyond those PNGs. You get all pareto results in pareto_aggregated.csv