facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.12k stars 332 forks source link

Robyn error while running OutputModels <- code - Error in { : task 1 failed - "arguments imply differing number of rows: 19, 20" #619

Closed richa-makhija closed 1 year ago

richa-makhija commented 1 year ago

Project Robyn

Describe issue

My code takes in all the inputs correctly but while running the OutputModels <- code, I get the above error

InputCollect_us <- robyn_inputs(
  dt_input = data_orig_us,
  dt_holidays = dt_prophet_holidays,
  date_var = c("date"),
  dep_var = "cust_acquired",
  dep_var_type = "conversion",
  prophet_vars = c("trend", "season", "holiday"),
  prophet_country ="US",
  context_vars = c("if_payday","if_covid","if_holiday","fx","fee_avg","fee_promo_spend", "fx_promo_spend", "us_unemployment_rate", "us_inflation"), 
  paid_media_spends = c("paid_social_s", "paid_search_s", "brand_s", "uac_s", "upper_funnel_s", "houston_s", "all_other_s"),
  paid_media_vars = c("paid_social_i", "paid_search_i", "brand_i", "uac_i", "upper_funnel_s", "houston_s", "all_other_s"),
  ##follow same sequence in spends and vars. if using imp dont add spends again to vars.. if no imp, repeat spend variable
  organic_vars = c(),
  ## can be kpt empty
  factor_vars = c("if_payday","if_covid","if_holiday"), 
  ## category or binary vars
  window_start = "2022-08-15",
  window_end = "2022-12-31",
  adstock = "weibull_cdf" 
  ##geometric is simpler
)
print(InputCollect_us)

################ Set Hyper Parameters ####################

plot_adstock(plot = FALSE)
plot_saturation(plot = FALSE)

##use same ranges
hyperparameters_us <- list(
  paid_social_s_alphas = c(0.5,3),
  paid_social_s_gammas = c(0.3, 1),
  paid_social_s_shapes = c(0.0001, 2),
  paid_social_s_scales = c(0, 0.1),

  paid_search_s_alphas = c(0.5, 3),
  paid_search_s_gammas = c(0.3, 1),
  paid_search_s_shapes = c(0.0001, 2),
  paid_search_s_scales = c(0, 0.1),

  brand_s_alphas = c(0.5, 3),
  brand_s_gammas = c(0.3, 1),
  brand_s_shapes = c(0.0001, 2),
  brand_s_scales = c(0, 0.1),

  uac_s_alphas = c(0.5, 3),
  uac_s_gammas = c(0.3, 1),
  uac_s_shapes = c(0.0001, 2),
  uac_s_scales = c(0, 0.1),

  upper_funnel_s_alphas = c(0.5, 3),
  upper_funnel_s_gammas = c(0.3, 1),
  upper_funnel_s_shapes = c(0.0001, 2),
  upper_funnel_s_scales = c(0, 0.1),

  houston_s_alphas = c(0.5, 3),
  houston_s_gammas = c(0.3, 1),
  houston_s_shapes = c(0.0001, 2),
  houston_s_scales = c(0, 0.1),

  all_other_s_alphas = c(0.5, 3),
  all_other_s_gammas = c(0.3, 1),
  all_other_s_shapes = c(0.0001, 2),
  all_other_s_scales = c(0, 0.1)
)
InputCollect_us <- robyn_inputs(InputCollect = InputCollect_us, hyperparameters = hyperparameters_us)
print(InputCollect_us)

OutputModels_us <- robyn_run(
  InputCollect = InputCollect_us, # feed in all model specification
  # cores = NULL, # default to max available
  # add_penalty_factor = FALSE, # Untested feature. Use with caution.
  lambda_control = lambda_min,
  iterations = 2000, # 2000 recommended for the dummy dataset with no calibration
  trials = 5, # 5 recommended for the dummy dataset
  outputs = FALSE # outputs = FALSE disables direct model output - robyn_outputs()
)

Error:

Input data has 365 days in total: 2022-01-01 to 2022-12-31
Initial model is built on rolling window of 139 day: 2022-08-15 to 2022-12-31
Fitting time series with all available data...
Using weibull_cdf adstocking with 30 hyperparameters (29 to iterate + 1 fixed) on 11 cores
>>> Starting 5 trials with 2000 iterations each using TwoPointsDE nevergrad algorithm...
  Running trial 1 of 5
  |                                                                                                                                                                                                  |   0%Timing stopped at: 2.284 1.347 1.129
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 19, 20"
In addition: Warning message:
In hyper_collector(InputCollect, hyper_in = InputCollect$hyperparameters,  :
  Provided train_size but ts_validation = FALSE. Time series validation inactive.

Provide reproducible example

Issues are often related to custom input data that is difficult to debug without. If necessary, please modify your data to mask real values and share a dataset that is able to reproduce the issue. Please also share your model configuration and exported JSON files if available.

Environment & Robyn version

R.version$version.string [1] "R version 4.2.2 (2022-10-31)" packageVersion("Robyn") [1] ‘3.9.0’

richa-makhija commented 1 year ago

Some additional context, my data has range from 1/1/22 - 12/31/22 but I am looking to run this model from 8/15 - 12/31.... If i use the window start and window end as 1/1 - 12/31 then the model works.....

dmacoritto commented 1 year ago

@Amyhaoming, any updates on that issue ? I am having the same issue when trying to read and reproduce a model made under the 3.7.0 version in the 3.9.0 version.

The error message is the following:

Input data has 1369 days in total: 2019-01-01 to 2022-09-30
Initial model is built on rolling window of 290 day: 2021-12-15 to 2022-09-30
Warning: longer argument not a multiple of length of shorterAttention for loop 1: immediate & carryover decomp don't sum up to total
Timing stopped at: 0.02 0 0.19
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 25, 26"

And this is the code:

model_chfr_trans_saved <- Robyn::robyn_read(json_file = "MMM/Model/Robyn_202301091301_init/RobynModel-1_342_6.json")

InputCollect_chfr_trans_saved <- robyn_inputs(dt_input = input_mmm_chfr_transaction, 
                                              json_file = model_chfr_trans_saved)

dt_hyper_fixed_chfr_trans_saved <- model_chfr_trans_saved$ExportedModel$hyper_values
# select_model <- model_chfr_trans$ExportedModel$select_model

OutputCollect_chfr_trans_saved <- robyn_run(
  InputCollect = InputCollect_chfr_trans_saved,
  dt_hyper_fixed = dt_hyper_fixed_chfr_trans_saved,
  json_file = model_chfr_trans_saved,
  export = FALSE
)

This works perfectly in the 3.7.0 version

And the traceback

image

Not sure what is going on, I have spend some time investigating, but couldn't find the issue.

Thanks for your help

gufengzhou commented 1 year ago

It looks like you're loading an old model that was built using older version, then recreating using the new version. This might the reason. Esp. Because after 3.9 there's a new hyperparameter train_size that didn't exist before. Can you please rerun your model using the latest package with the narrower ranges as in your old model.

dmacoritto commented 1 year ago

@gufengzhou, yes it works with the 3.7 version. Nonetheless, I am not sure this is due to the hyperparameter train_size, as it is created automatically if not detected (its value is equal to 1). Do you mean that all models are not retro-compatible? The ones created in 3.9 will not be compatible with higher versions ?

sahbakn commented 1 year ago

@Amyhaoming Is there an update on this bug? I have tried using different Robyn versions with the same dataset and I see this error coming up starting version 3.9.0 which makes me think this it relates to addition of ts_validation starting this version. However, setting ts_validation to False or train_size to 1 does not fix the error. In my case, the error only shows up in certain modeling periods. For example, For 1 year training data it works without issues, but not 9 months or less. I am not trying to load an old model, these are all building models from scratch

arturodz commented 1 year ago

Hi!

I can report I am having the same issue. Interestingly is only happening with one of my clients. I tried different date ranges with no success in neither version 3.9 and 3.10. I was successful in running robyn back on version 3.7.2 for this client.

sahbakn commented 1 year ago

@gufengzhou @laresbernardo Can you guys kindly take a look at this bug again? as I mentioned in my previous comment, I am not trying to load a model based on an older version, I am building a model from scratch and I see this error after version 3.9.0 for some datasets. I can send you guys a sample dataset through our Meta marketing science partner for the debug if it helps

laresbernardo commented 1 year ago

Hi @sahbakn, please do. That will help us debug and understand what's happening in your specific case. Thanks!

SeanRichterWalsh commented 1 year ago

I am getting the same error message when I try to pass a character vector to context_vars

However, if I unhash and use the factor_vars argument, it works.

What am I misunderstanding here?

laresbernardo commented 1 year ago

@SeanRichterWalsh not sure that's actually the case here. What version are you on? Note that in the demo we set "events" which is a character column as one of the variables in "context_vars". When doing that, Robyn automatically detects this and sets it as one of the "factor_vars" event hough the user doesn't manually do that. That behavior is actually printed as a message for you the user to know:

Automatically set these variables as 'factor_vars': 'events'

Also keep in mind that this message won't show up if you add robyn_inputs(..., factor_vars = "events", ...) If you have a reproducible example and are using the latest dev version, please do share it in another ticket given I don't think these are related with the information provided @SeanRichterWalsh

SeanRichterWalsh commented 1 year ago

Thanks @laresbernardo . Yes, I am using the latest dev version. I restarted my R session and tried again with my dataset and it seems to work fine now. My own data set's events variable is now being auto-forced to factor as expected and I don't have to explicitly use the factor_vars argument. Not sure what was up earlier. Thanks.

TrunckYagora commented 1 year ago

Hi,

we're running into the same issue with both version 3.9 and 3.10: task 1 failed - "arguments imply differing number of rows: 6, 7" Has there been an update on this issue yet?

laresbernardo commented 1 year ago

Hi @TrunckYagora we haven't been able to replicate this issue yet. Can you please provide a reproducible example that returns that error so it can help us debug?

SeanRichterWalsh commented 1 year ago

Just a final comment from me on this. I know my issue may not be related but I did get a similar error message when exploring different modelling windows. I believe what may have caused it is my events variable having zero variance (all "na"). I mistakenly shortened the window too much and lost the events I had coded in the variable.

arturodz commented 1 year ago

Interesting point. This might have happened to me as well. I will do some tests tomorrow see if I can reproduce that error with those circumstance. Will circle back if so.

On May 15, 2023, at 10:54 AM, Sean Walsh @.***> wrote:

Just a final comment from me on this. I know my issue may not be related but I did get a similar error message when exploring different modelling windows. I believe what may have caused it is my events variable having zero variance (all "na"). I mistakenly shortened the window too much and lost the events I had coded in the variable. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

laresbernardo commented 1 year ago

YES! That was it. We actually checked for no variance on raw input data before running robyn_engineering() and not afterwards. Now I've just fixed this issue by checking both and returning meaningful and helpful errors. Can any of you update and validate it's fixed on latest dev version? Thanks for the valuable hint @SeanRichterWalsh

sahbakn commented 1 year ago

@laresbernardo I looked into my data and during the modeling time frame that I was getting the error, I had one variable with 0 variance, fixing that I do not see the error anymore!

SeanRichterWalsh commented 1 year ago

YES! That was it. We actually checked for no variance on raw input data before running robyn_engineering() and not afterwards. Now I've just fixed this issue by checking both and returning meaningful and helpful errors. Can any of you update and validate it's fixed on latest dev version? Thanks for the valuable hint @SeanRichterWalsh

Oh great! A silly mistake on my part but I am glad it has helped lead to a resolution here. I can confirm that the latest dev version gives a very informative message when a variable has zero variance. Thanks a lot.

image
laresbernardo commented 1 year ago

Great! Thanks for confirming. I'll check with @richa-makhija and @dmacoritto as well! This should have fixed the issue for everyone. Will close ticket after a week or confirmation.

TrunckYagora commented 1 year ago

Hi @laresbernardo,

thanks for the update, interestingly enough we're receive the error even when we're running the model without the factor_vars. For privacy reasons I altered the original data but the error can still be reproduced. You can find the data we're using to reproduce the error here: mockup_data.csv

package Version is: ‘3.10.3.9000’

The code is as follows:

data("dt_prophet_holidays")
head(dt_prophet_holidays)
selected_dt <- read.csv("mockup.csv")

InputCollect <- robyn_inputs(
  dt_input = selected_dt,
  dt_holidays = dt_prophet_holidays,
  date_var = "date", # date format must be "2020-01-01"
  dep_var = "Umsatz", # there should be only one dependent variable
  dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA)
  prophet_vars = c("trend", "season", "holiday"), # "trend","season", "weekday" & "holiday"
  prophet_country = "DE", # input one country. dt_prophet_holidays includes 59 countries by default
  paid_media_spends = c("cost_dv360", "cost_fb_insta", "cost_pinterest"), # mandatory input
  paid_media_vars = c("impression_dv360", "impression_fb_insta", "impression_pinterest"), # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  # organic_vars = "events", # marketing activity without media spend
  # factor_vars = c("events"), # force variables in context_vars or organic_vars to be categorical
  # window_start = min(selected_dt$date),
  # window_end = max(selected_dt$date),
  adstock = "geometric" # geometric, weibull_cdf or weibull_pdf.
)
print(InputCollect)

hyper_names(adstock = InputCollect$adstock, all_media = InputCollect$all_media)
# plot_adstock(plot = TRUE)
# plot_saturation(plot = TRUE)
hyper_limits()

# Example hyperparameters ranges for Geometric adstock
hyperparameters <- list(
  cost_dv360_alphas = c(0.5, 3),
  cost_dv360_gammas = c(0.3, 1),
  cost_dv360_thetas = c(0, 0.3),
  cost_fb_insta_alphas = c(0.5, 3),
  cost_fb_insta_gammas = c(0.3, 1),
  cost_fb_insta_thetas = c(0.1, 0.4),
  cost_pinterest_alphas = c(0.5, 3),
  cost_pinterest_gammas = c(0.3, 1),
  cost_pinterest_thetas = c(0.3, 0.8),
  train_size = c(0.3, 0.8)
)

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
print(InputCollect)

OutputModels <- robyn_run(
  InputCollect = InputCollect, # feed in all model specification
  cores = NULL, # NULL defaults to (max available - 1)
  iterations = 2000, # 2000 recommended for the dummy dataset with no calibration
  trials = 5, # 5 recommended for the dummy dataset
  ts_validation = TRUE, # 3-way-split time series for NRMSE validation.
  add_penalty_factor = FALSE # Experimental feature. Use with caution.
)
print(OutputModels)
laresbernardo commented 1 year ago

@TrunckYagora thanks for reporting this and providing a reproducible example. The problem was occurring when there were some variables not being used so weren't found when unselecting them. Can you please update to latest dev version and check? You should get this error given your dummy dataset now:

> InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
>> Running feature engineering...
NOTE: potential improvement on splitting channels for better exposure fitting. Threshold (Minimum R2) = 0.8 
  Check: InputCollect$plotNLSCollect outputs
  Weak relationship for: 'impression_dv360', 'impression_fb_insta', 'impression_pinterest' and their spend
Error in check_novar(dt_mod_model_window, InputCollect) : 
  There are 1 column(s) with no-variance: 'holiday'. 
Please, remove variable(s) to proceed...
Note that there's no variance when filtering the modeling window (2022-10-10:2022-12-03)
TrunckYagora commented 1 year ago

Hi @laresbernardo

awesome - thank you very much for your support. In the latest dev version I get the error message you mentioned and we can run Robyn as usual with the correct parameters.