facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.15k stars 343 forks source link

Concerns about Trustworthiness [Reproducibility] #610

Closed bart-vanvlerken closed 1 year ago

bart-vanvlerken commented 1 year ago

Project Robyn

Describe issue

I would like to use Robyn to help clients with MMM. However, before I do so, I need to know if I can trust its output. I ran the demo data twice back-to-back using the same seed, but I got different results. If Robyn generates two different models with the same data, what guarantee do I have that I can trust the output and that the best model is not just a random model?

I really like Robyn and how convenient it is to do MMM, but I need someone to address my concerns before I use it to help clients because I don't want to give false insights. I hope someone can help me out!

Environment & Robyn version

packageVersion("Robyn") [1] ‘3.9.0’ sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] doRNG_1.8.6 rngtools_1.5.2 foreach_1.5.2 forcats_0.5.2 stringr_1.5.0
[6] dplyr_1.0.10 purrr_1.0.1 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8
[11] ggplot2_3.4.0 tidyverse_1.3.2 Robyn_3.9.0 reticulate_1.27

loaded via a namespace (and not attached): [1] googledrive_2.0.0 colorspace_2.0-3 ellipsis_0.3.2 ggridges_0.5.4
[5] rprojroot_2.0.3 fs_1.5.2 rstudioapi_0.14 farver_2.1.1
[9] rstan_2.21.8 fansi_1.0.3 lubridate_1.9.0 xml2_1.3.3
[13] codetools_0.2-18 splines_4.2.2 doParallel_1.0.17 knitr_1.41
[17] jsonlite_1.8.4 nloptr_2.0.3 pROC_1.18.0 broom_1.0.2
[21] dbplyr_2.3.0 png_0.1-8 compiler_4.2.2 httr_1.4.4
[25] backports_1.4.1 assertthat_0.2.1 Matrix_1.5-1 lazyeval_0.2.2
[29] gargle_1.2.1 cli_3.6.0 rPref_1.3 prettyunits_1.1.1
[33] tools_4.2.2 igraph_1.3.5 gtable_0.3.1 glue_1.6.2
[37] rappdirs_0.3.3 Rcpp_1.0.9 prophet_1.0 cellranger_1.1.0
[41] h2o_3.38.0.1 vctrs_0.5.1 iterators_1.0.14 xfun_0.36
[45] ps_1.7.2 openxlsx_4.2.5.1 rvest_1.0.3 timechange_0.2.0
[49] lifecycle_1.0.3 googlesheets4_1.0.1 scales_1.2.1 hms_1.1.2
[53] parallel_4.2.2 inline_0.3.19 RColorBrewer_1.1-3 rpart.plot_3.1.1
[57] yaml_2.3.6 gridExtra_2.3 loo_2.5.1 StanHeaders_2.21.0-7 [61] rpart_4.1.19 stringi_1.7.12 pkgbuild_1.4.0 zip_2.2.2
[65] shape_1.4.6 rlang_1.0.6 pkgconfig_2.0.3 bitops_1.0-7
[69] matrixStats_0.63.0 lattice_0.20-45 patchwork_1.1.2 labeling_0.4.2
[73] tidyselect_1.2.0 processx_3.8.0 here_1.0.1 plyr_1.8.8
[77] magrittr_2.0.3 R6_2.5.1 generics_0.1.3 DBI_1.1.3
[81] pillar_1.8.1 haven_2.5.1 withr_2.5.0 survival_3.4-0
[85] RCurl_1.98-1.9 modelr_0.1.10 crayon_1.5.2 utf8_1.2.2
[89] tzdb_0.3.0 lares_5.1.4 grid_4.2.2 readxl_1.4.1
[93] minpack.lm_1.2-2 callr_3.7.3 reprex_2.0.2 digest_0.6.31
[97] extraDistr_1.9.1 RcppParallel_5.1.6 stats4_4.2.2 munsell_0.5.0
[101] glmnet_4.1-6

laresbernardo commented 1 year ago

Hi @bart-vanvlerken thanks for the question. As we mentioned a couple of days ago to these other user:

We've worked on reproducibility by enabling the seed parameter and it "usually" delivers the same results. The seed parameter is used across the whole process to set reproducible randomness (including in the nevergrad Py section to pick the first trial seeds). But, given the results are using parallel computing and gathering calculation results as they are delivered by the iterations, you may get different results. Also, with different n cores you'll def get different results for same inputs and same seed. We are open to suggestions to fix this but afaik there's not much we can do here to "guarantee" reproducibility. I guess saving the results (JSON/RDS) will also be a way to reproduce any model in other computers or later on.

Also, I'd like to add that regardless of the initial randomness reproducibility, the multi-objective optimization will always provide the best mathematical solutions and will try to converge into minimum errors. The only problem here is that you won't be able to always get exactly the same results (keep in mind the all the above reasons and options).