facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
177 stars 54 forks source link

Error in summary.connection(connection) : invalid connection #36

Closed PollyStar closed 2 years ago

PollyStar commented 2 years ago

Hi, I've been trying to run the Geo Lift example code, but having issues (parallel computing) with the following:

resultsSearch <- GeoLiftPower.search(data = GeoTestData_PreTest, treatment_periods = c(15), N = c(2,3,4), horizon = 50, Y_id = "Y", location_id = "location", time_id = "time", top_results = 20, alpha = 0.1, type = "pValue", fixed_effects = TRUE, ProgressBar = TRUE)

Results in the following error: Setting up cluster. Importing functions into cluster. Deterministic setup with 2 locations in treatment. Error in summary.connection(connection) : invalid connection

I have tried it on a number of different machines but getting same error. Any ideas / solutions?

NicolasMatrices-v2 commented 2 years ago

Hi @PollyStar, thanks for raising this issue. Could you share your sessionInfo() as well so we can take a look?

In the meantime, could you run without parallelization by setting parallel = FALSE and check to see if that works (albeit, more slowly)?

Thanks!

PollyStar commented 2 years ago

Hey, cheers, thanks. The parallel=F works!

Session info:

R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] Hmisc_4.6-0 ggplot2_3.3.5 Formula_1.2-4 survival_3.2-13 lattice_0.20-45
[6] forecast_8.16 nlme_3.1-155 olsrr_0.5.3 corrr_0.4.3 car_3.0-12
[11] carData_3.0-5 writexl_1.4.0 xlsx_0.6.5 readxl_1.3.1 tidyr_1.1.4
[16] MarketMatching_1.2.0 dplyr_1.0.7 gsynth_1.2.1 augsynth_0.2.0 doParallel_1.0.16
[21] iterators_1.0.13 foreach_1.5.1 GeoLift_2.2.2

loaded via a namespace (and not attached): [1] backports_1.4.1 plyr_1.8.6 splines_4.1.2 listenv_0.8.0 usethis_2.1.5
[6] CausalImpact_1.2.7 digest_0.6.29 htmltools_0.5.2 fansi_0.5.0 astsa_1.14
[11] magrittr_2.0.1 checkmate_2.0.0 memoise_2.0.1 cluster_2.1.2 remotes_2.4.2
[16] globals_0.14.0 xts_0.12.1 sandwich_3.0-1 tseries_0.10-49 prettyunits_1.1.1
[21] jpeg_0.1-9 colorspace_2.0-2 ggrepel_0.9.1 xfun_0.29 callr_3.7.0
[26] crayon_1.4.2 zoo_1.8-9 glue_1.6.0 gtable_0.3.0 pkgbuild_1.3.1
[31] quantmod_0.4.18 abind_1.4-5 scales_1.1.1 mvtnorm_1.1-3 rngtools_1.5.2
[36] Rcpp_1.0.8 lfe_2.8-7.1 dtw_1.22-3 xtable_1.8-4 progress_1.2.2
[41] htmlTable_2.4.0 foreign_0.8-81 proxy_0.4-26 htmlwidgets_1.5.4 RColorBrewer_1.1-2
[46] ellipsis_0.3.2 BoomSpikeSlab_1.2.4 pkgconfig_2.0.3 rJava_1.0-6 farver_2.1.0
[51] nnet_7.3-16 utf8_1.2.2 tidyselect_1.1.1 labeling_0.4.2 rlang_0.4.12
[56] reshape2_1.4.4 munsell_0.5.0 cellranger_1.1.0 tools_4.1.2 cachem_1.0.6
[61] osqp_0.6.0.5 cli_3.1.0 generics_0.1.2 devtools_2.4.3 stringr_1.4.0
[66] fastmap_1.1.0 goftest_1.2-3 knitr_1.37 processx_3.5.2 fs_1.5.2
[71] purrr_0.3.4 future_1.23.0 doRNG_1.8.2 brio_1.1.3 compiler_4.1.2
[76] rstudioapi_0.13 curl_4.3.2 png_0.1-7 testthat_3.1.2 tibble_3.1.6
[81] stringi_1.7.6 ps_1.6.0 desc_1.4.0 Matrix_1.3-4 LowRankQP_1.0.4
[86] urca_1.3-0 vctrs_0.3.8 pillar_1.7.0 lifecycle_1.0.1 lmtest_0.9-39
[91] data.table_1.14.2 bsts_0.9.7 R6_2.5.1 latticeExtra_0.6-29 directlabels_2021.1.13 [96] panelView_1.1.5 gridExtra_2.3 parallelly_1.30.0 sessioninfo_1.2.2 Boom_0.9.7
[101] codetools_0.2-18 MASS_7.3-54 assertthat_0.2.1 pkgload_1.2.4 xlsxjars_0.6.1
[106] rprojroot_2.0.2 withr_2.4.3 nortest_1.0-4 fracdiff_1.5-1 hms_1.1.1
[111] quadprog_1.5-8 grid_4.1.2 rpart_4.1-15 timeDate_3043.102 TTR_0.24.3
[116] base64enc_0.1-3

yulisong commented 2 years ago

Hi @NicolasMatrices-v2, I've met the same issue when I ran the GeoLiftMarketSelection() function in the demo. I've looked into it and below is my finding,

  1. The parallel backend is firstly registered in build_cluster() by calling registerDoParallel().
  2. Then, the parallel backend is registered again and a parallel computation is conducted by using foreach() in MarketMatching::best_matches().
  3. After that, stopImplicitCluster() in MarketMatching::best_matches() shut down the workers.
  4. Afterwards, without registering the parallel backend again, run_simulations() directly uses foreach() to conduct the parallel computing.
  5. Eventually, the workers are shut down by calling stopCulster().

When I tried to register the parallel backend in the 4th step, the code works on my computer. I assume it's the problem that we didn't register the parallel backend that caused the issue.

I'm not familiar with parallel computing in r. Therefore, if there are any errors in my guess, please correct me.

NicolasMatrices-v2 commented 2 years ago

Hey all, @PollyStar, @yulisong !

Thanks Yuli for taking a look under the hood, your contributions are always appreciated :)

We have changed the parallelization to FALSE for the MarketMatching process. We've checked it on a Windows machine and the parallelization works. Could you reinstall set parallel = TRUE and let us know if it works ? Thank you.

yulisong commented 2 years ago

Hey all, @PollyStar, @yulisong !

Thanks Yuli for taking a look under the hood, your contributions are always appreciated :)

We have changed the parallelization to FALSE for the MarketMatching process. We've checked it on a Windows machine and the parallelization works. Could you reinstall set parallel = TRUE and let us know if it works ? Thank you.

Hi @NicolasMatrices-v2, it works now. Thank you for your quick update!