Error in setcolorder(dt_hyppar, sort(names(dt_hyppar))) :x has some duplicated column name(s):

facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.

https://facebookexperimental.github.io/Robyn/

MIT License

1.16k stars 344 forks source link

Error in setcolorder(dt_hyppar, sort(names(dt_hyppar))) :x has some duplicated column name(s): #232

Closed michellegrushko-glossier closed 2 years ago

michellegrushko-glossier commented 2 years ago

Project Robyn

Describe issue

I am trying to run the budget allocation using the following:

AllocatorCollect <- robyn_allocator( InputCollect = InputCollect , OutputCollect = OutputCollect , select_model = select_model , scenario = "max_historical_response" , channel_constr_low = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8) , channel_constr_up = c(1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2) )

And am seeing the error:

Running budget allocator for model ID 3_200_3 ... [variable], [variable], [variable] are excluded in optimiser because their coeffients are 0 Error in setcolorder(dt_hyppar, sort(names(dt_hyppar))) : x has some duplicated column name(s): brand_sem_us_I_alphas,brand_sem_us_I_gammas,brand_sem_us_I_thetas. Please remove or rename the duplicate(s) and try again.

I tried looking into the inputs and am not seeing where these duplications are occurring and the error message is showing names which differ in terms of "alpha" "gammas" and "thetas", I'm not sure if I am doing something wrong or if this is a bug

Provide dummy data & model configuration

Issues are often related to custom input data that is difficult to debug without. If necessary, please modify your data to mask real values and share a dataset that is able to reproduce the issue. Please also share your model configuration.

Environment & Robyn version

R version (R --version) Please make sure you're using the latest Robyn version

michellegrushko-glossier commented 2 years ago

Thank you! What is the best way to include this commit? I tried to reinstall the package to update, but it looks like I still have the same error

gufengzhou commented 2 years ago

oh sorry I mixed it up! this commit is meant for this issue: https://github.com/facebookexperimental/Robyn/issues/228

I'm fix your next:) sorry again

michellegrushko-glossier commented 2 years ago

No worries! Thank you so much!

gufengzhou commented 2 years ago

I can't reproduce the error. Are you getting this everytime you use the allocator, no matter which model you select? I'd need your dataset (anonymised of course) and your demo.R script with model setup to debug.

michellegrushko-glossier commented 2 years ago

Yes I just tried it with a different model and it looks like I have the same error. I'm not really understanding where these duplications are happening

Running budget allocator for model ID 3_272_5 ... x, x, x, x are excluded in optimiser because their coeffients are 0 Error in setcolorder(dt_hyppar, sort(names(dt_hyppar))) : x has some duplicated column name(s): brand_sem_us_I_alphas,brand_sem_us_I_gammas,brand_sem_us_I_thetas. Please remove or rename the duplicate(s) and try again. InputCollect$hyperparameters $Affiliate_Coupon_S_alphas [1] 0.5 3.0

$Affiliate_Coupon_S_gammas [1] 0.3 1.0

$Affiliate_Coupon_S_thetas [1] 0.001 0.400

$Affiliate_Loyalty_CB_S_alphas [1] 0.5 3.0

$Affiliate_Loyalty_CB_S_gammas [1] 0.3 1.0

$Affiliate_Loyalty_CB_S_thetas [1] 0.001 0.400

$Affiliate_Other_S_alphas [1] 0.5 3.0

$Affiliate_Other_S_gammas [1] 0.3 1.0

$Affiliate_Other_S_thetas [1] 0.001 0.400

$Affiliate_Payment_Offers_S_alphas [1] 0.5 3.0

$Affiliate_Payment_Offers_S_gammas [1] 0.3 1.0

$Affiliate_Payment_Offers_S_thetas [1] 0.001 0.400

$brand_sem_us_I_alphas [1] 0.5 3.0

$brand_sem_us_I_gammas [1] 0.3 1.0

$brand_sem_us_I_thetas [1] 0.001 0.400

$display_us_I_alphas [1] 0.5 3.0

$display_us_I_gammas [1] 0.3 1.0

$display_us_I_thetas [1] 0.1 0.4

$fb_ig_ampush_I_alphas [1] 0.5 3.0

$fb_ig_ampush_I_gammas [1] 0.3 1.0

$fb_ig_ampush_I_thetas [1] 0.1 0.4

$fb_ig_pmg_I_alphas [1] 0.5 3.0

$fb_ig_pmg_I_gammas [1] 0.3 1.0

$fb_ig_pmg_I_thetas [1] 0.1 0.4

$nonbrand_sem_us_I_alphas [1] 0.5 3.0

$nonbrand_sem_us_I_gammas [1] 0.3 1.0

$nonbrand_sem_us_I_thetas [1] 0.1 0.4

$pinterest_us_I_alphas [1] 0.5 3.0

$pinterest_us_I_gammas [1] 0.3 1.0

$pinterest_us_I_thetas [1] 0.3 0.8

$shopping_us_I_alphas [1] 0.5 3.0

$shopping_us_I_gammas [1] 0.3 1.0

$shopping_us_I_thetas [1] 0.1 0.4

$TikTok_brand_S_alphas [1] 0.5 3.0

$TikTok_brand_S_gammas [1] 0.3 1.0

$TikTok_brand_S_thetas [1] 0.3 0.8

$total_brand_without_yt_tt_S_alphas [1] 0.5 3.0

$total_brand_without_yt_tt_S_gammas [1] 0.3 1.0

$total_brand_without_yt_tt_S_thetas [1] 0.3 0.8

$Youtube_brand_S_alphas [1] 0.5 3.0

$Youtube_brand_S_gammas [1] 0.3 1.0

$Youtube_brand_S_thetas [1] 0.3 0.8

Do you need all the MMM data? or can i provide the input & output data with the aggregated info for the models and hyper parameters?

gufengzhou commented 2 years ago

Hm how come the warning says "x, x, x, x are excluded". It supposed to be var names. I assume you import csv using read.csv function? Can you try data.table::fread ?

michellegrushko-glossier commented 2 years ago

Oh lol that was just me removing the actual variables names, the real ones show up in the error

gufengzhou commented 2 years ago

Haha alright. Then I need your dataset and the demo.R script for debugging unfortunately

michellegrushko-glossier commented 2 years ago

Okay! I'll work on renaming everything in the data, the input/output, and the hyperparameters and will send it over shortly. Thank you so much for looking into this!

michellegrushko-glossier commented 2 years ago

I'm having a hard time changing the names within OutputCollect, would I be able to provide pareto_aggregated instead?

gufengzhou commented 2 years ago

You said every selected model throws this error right? Then I only need your input data and demo.R and will just run some iterations myself

michellegrushko-glossier commented 2 years ago

yes correct! sounds good, I just emailed them to you (but accidentally did not include a subject line)

also unsure if this is related but I'm now starting to see the error message Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Robyn/help/Robyn.rdb' is corrupt

when i pull up any robyn function

gufengzhou commented 2 years ago

just tested it quickly by running 100x2 iters, everything runs fine... what is your Robyn package version in sessionInfo()? I'm on Robyn_3.4.8 Because it's an error from setcolorder that is an data.table function, my loaded data.table version is data.table_1.14.2

michellegrushko-glossier commented 2 years ago

My Robyn is 3.4.8 but it looks like data.table is 1.14.0. I will update and rerun!

michellegrushko-glossier commented 2 years ago

I am getting the same error Here is my session info:

R version 4.0.4 (2021-02-15) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] Robyn_3.4.8 rstudioapi_0.13 reticulate_1.22 rPref_1.3
[5] minpack.lm_1.2-1 nloptr_1.2.2.3 PerformanceAnalytics_2.0.4 xts_0.12.1
[9] zoo_1.8-9 see_0.6.4 ggpubr_0.4.0 gridExtra_2.3
[13] rstan_2.21.2 ggplot2_3.3.5 prophet_1.0 rlang_0.4.12
[17] Rcpp_1.0.7 StanHeaders_2.21.0-7 car_3.0-10 carData_3.0-4
[21] glmnet_4.1-3 Matrix_1.3-2 doParallel_1.0.16 iterators_1.0.13
[25] foreach_1.5.1 lubridate_1.8.0 stringr_1.4.0

loaded via a namespace (and not attached): [1] matrixStats_0.61.0 insight_0.14.2 doRNG_1.8.2 tools_4.0.4 backports_1.4.0 utf8_1.2.2
[7] R6_2.5.1 lazyeval_0.2.2 DBI_1.1.1 colorspace_2.0-2 withr_2.4.3 tidyselect_1.1.1
[13] prettyunits_1.1.1 processx_3.5.2 curl_4.3.2 compiler_4.0.4 cli_3.1.0 bayestestR_0.10.5 [19] scales_1.1.1 quadprog_1.5-8 ggridges_0.5.3 callr_3.7.0 digest_0.6.29 foreign_0.8-81
[25] rio_0.5.26 pkgconfig_2.0.3 readxl_1.3.1 shape_1.4.6 generics_0.1.1 jsonlite_1.7.2
[31] dplyr_1.0.7 zip_2.1.1 inline_0.3.19 magrittr_2.0.1 loo_2.4.1 patchwork_1.1.1
[37] parameters_0.14.0 munsell_0.5.0 fansi_0.5.0 abind_1.4-5 lifecycle_1.0.1 stringi_1.7.6
[43] pkgbuild_1.2.1 plyr_1.8.6 forcats_0.5.1 crayon_1.4.2 lattice_0.20-41 haven_2.3.1
[49] splines_4.0.4 hms_1.0.0 ps_1.6.0 pillar_1.6.4 igraph_1.2.9 ggsignif_0.6.1
[55] rngtools_1.5.2 effectsize_0.4.5 codetools_0.2-18 stats4_4.0.4 glue_1.5.1 V8_3.6.0
[61] data.table_1.14.2 RcppParallel_5.1.4 png_0.1-7 vctrs_0.3.8 cellranger_1.1.0 gtable_0.3.0
[67] purrr_0.3.4 tidyr_1.1.4 assertthat_0.2.1 datawizard_0.1.0 openxlsx_4.2.3 broom_0.7.5
[73] rstatix_0.7.0 survival_3.2-7 tibble_3.1.6 ellipsis_0.3.2

michellegrushko-glossier commented 2 years ago

Also it might be easier if we could schedule a call to chat through it?

michellegrushko-glossier commented 2 years ago

I tried running it on an older script and am getting the same error. Not sure if this is helpful but it seems like its something to do with setcolorder and referencing the col names https://stackoverflow.com/questions/23874978/unhelpful-error-in-data-table-merge-of-identical-schema-tables

paul-sims-bpm commented 2 years ago

@michellegrushko-glossier - were you ever able to resolve this? I am also running into a data.table error though slightly different.

Running budget allocator for model ID 3_269_6 ... channel_1, channel_2, channel_3, channel_4, channel_5, channel_6 are excluded in optimiser because their coeffients are 0 Error in [.data.table(dt_hyppar, , .SD, .SDcols = na.omit(str_extract(names(dt_hyppar), : Some items of .SDcols are not column names: [channel7g_alphas,channel7g_gammas,channel7g_thetas]

For some reason, the hyperparameter names for channel7 have gotten a "g" added to them - I'm not sure if it's a data.table issue or what.

I'm not sure if you managed to find the dt_hyppar dataframe (I could not find it), but none of the hyperparameters I've set or that are in the robyn object appear to follow the naming data.table is saying. Would appreciate hearing your solution and if you managed to find the dt_hyppar dataframe.

michellegrushko-glossier commented 2 years ago

hey @paul-sims-bpm! unfortunately I was not able to fix this error. something you can try first is making sure your Robyn and data.table libraries are up to date (mine were not but that didn't solve the error for me - still worth a shot!)

I ultimately got around this error by renaming all of my channel names and use the hyperparameters from the desired model, and set very narrow bounds around them to rerun the model normally. E.g. if you have channel_A_thetas = 0.5, you can set the bound to c(0.49, .51) in hyperparameters etc. and run it normal again from the beginning. You should get very similar results as before and then rerun the budget allocation

laresbernardo commented 2 years ago

Closing this ticket now. Let us know if you need further help