Closed lukmaz closed 1 year ago
Hi @lukmaz You're right in mostly everything you mentioned. The reason cores = 1
runs correctly is because we turn off parallel computing in this scenarios. cores = NULL
will run all available minus 1. So, probably, your Jupyterlab config is having issues with parallel computing as it is.
Is it possible to limit the number of cores used in robyn_refresh, similarly as it is possible in robyn_run?
Yes. You can limit the cores (or turn off parallel computing) by setting OutputCollect$cores <- 1
before passing it to robyn_refresh()
.
I don't see how OutputCollect
is being passed to robyn_refresh()
:
robyn_refresh <- function(json_file = NULL,
robyn_object = NULL,
dt_input = NULL,
dt_holidays = Robyn::dt_prophet_holidays,
refresh_steps = 4,
refresh_mode = "manual",
refresh_iters = 1000,
refresh_trials = 3,
plot_folder = NULL,
plot_pareto = TRUE,
version_prompt = FALSE,
export = TRUE,
calibration_input = NULL,
...) {
I modified the OutputCollect
variable that is being used throughout the script, but it doesn't seem to influence the robyn_refresh()
function since it still runs on 3 cores after the modification.
Did you actually try passing cores = 1
within robyn_refresh(...)
? Those ...
are passed to robyn_run()
internally. That's actually the most straightforward way.
I've been using Robyn on Vertex AI for ~1 year now. I believe the issue has to do with the %dorng%
calls messing up the parallel computing in a JupyterLab environment.
The only workaround I've found that works since Robyn 3.6 is the following:
%dorng%
in model.R
and plots.R
to %do%
importFrom(foreach, "%do%")
to the NAMESPACEI'm sure this isn't the best workaround for reproducibility, because you're getting rid of the ability to assign a seed, but it's worked for me!
Actually, we've changed A LOT since version 3.6: https://github.com/facebookexperimental/Robyn/releases
If you guys are up to it, we are open to enable cloud instances and these kinds of solutions (automatically or via a new parameter). If foreach's %do%
works exactly as the current version and also enables you to run on Vertex, we can migrate. If you're willing to develop a solution, test it, and build a PR, we're open to implement it.
There's also this external post on Medium that can help you guys setup Vertex AI: Marketing Mix Modelling with Robyn on Vertex AI by Olejniczak Lukasz [Customer Engineer at Google Cloud (Smart Analytics & ML)]
@laresbernardo , you are right, passing cores = 1
directly to robyn_refresh()
works and it resolves the memory issues on Vertex AI, thanks!
I know the Medium post on running Robyn on Vertex AI. Unfortunately it's slightly outdated and does not work out of the box - Vertex AI seems to not accept custom Docker images build on top of the R image. Actually, I talked with Lukasz Olejniczak and he has not been running Robyn on Vertex AI since the publication of the article and is not aware of the issues with the current versions of Robyn and Vertex AI.
Trying to run the demo code on Vertex, did the following: installed on Vertex using instructions from that Medium post, then with the demo code it was consistently failing for me too at the OutputCollect <- robyn_outputs ..etc
step. The solution was commenting out
Sys.setenv(R_FUTURE_FORK_ENABLE = "true")
and
options(future.fork.enable = TRUE)
then as suggested, setting cores = 1
at the robyn_run() step. Thanks for starting this thread!
I've been using Robyn on Vertex AI for ~1 year now. I believe the issue has to do with the
%dorng%
calls messing up the parallel computing in a JupyterLab environment.The only workaround I've found that works since Robyn 3.6 is the following:
- Change any instances of
%dorng%
inmodel.R
andplots.R
to%do%
- add
importFrom(foreach, "%do%")
to the NAMESPACE- Re-compile and install Robyn with modifications
I'm sure this isn't the best workaround for reproducibility, because you're getting rid of the ability to assign a seed, but it's worked for me!
Hi @lukmaz , I'm curious if you could use multi-cores before, or it's an issue just lately? How is the speed with 1 core compared to before? Also, as Bernardo mentioned, one of the improvement lately is an 88% object size reduction. Could that have solve your memory-to-core-ratio issue?
I didn't run Robyn on Vertex AI before. I first tried in ~February this year and had memory issues from the beginning. I didn't notice any change after the last improvements by 88%. I didn't measure it precisely, but I noticed that the training runs much slower on 1 core, so probably the parallelism works fine (if it works at all and does not crash).
Project Robyn
Describe issue
I am trying to run the demo script in Jupyterlab on a n1-standard-4 machine in Vertex AI (4 vCPUs, 15 GB RAM). The script crashes in the
robyn_run
stage, causing a kernel restart.I suspect that it crashes due to exceeding the memory limit used by the default number of threads (for
cores = NULL
it runs4 - 1 = 3
threads). I noticed that when I change tocores = 1
inrobyn_run
arguments, it does not crash inrobyn_run
. So the reason looks to be a low memory to cores ratio in the machine.The problem that I cannot workaround is that the script crashes in a similar way in the
robyn_refresh
stage, possibly in the plotting code, since the last message before the crash logged to the output is:Plotting 4 selected models on 3 cores...
. For this issue, I checked also a higher memory Vertex AI machine (n2-highmem-16, 16 vCPUs, 128 GB RAM) and the problem still persists. I didn't find an option to reduce the number of cores used inrobyn_refresh
.Is it possible to limit the number of cores used in
robyn_refresh
, similarly as it is possible inrobyn_run
?Provide reproducible example
Run
demo.R
on n1-standard-4 or n2-highmem-16 machine in Vertex AI.Environment & Robyn version
Make sure you're using the latest Robyn version before you post an issue.
packageVersion("Robyn")
: 3.10.3.9000sessionInfo()
orR.version$version.string
):Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] doRNG_1.8.6 rngtools_1.5.2 foreach_1.5.2 Robyn_3.10.3.9000
loaded via a namespace (and not attached): [1] nlme_3.1-162 bitops_1.0-7 matrixStats_0.63.0
[4] lubridate_1.9.2 doParallel_1.0.17 RColorBrewer_1.1-3
[7] httr_1.4.5 rprojroot_2.0.3 rstan_2.21.8
[10] repr_1.1.6 tools_4.2.3 utf8_1.2.3
[13] R6_2.5.1 rpart_4.1.19 mgcv_1.8-42
[16] colorspace_2.1-0 withr_2.5.0 tidyselect_1.2.0
[19] gridExtra_2.3 prettyunits_1.1.1 processx_3.8.1
[22] compiler_4.2.3 textshaping_0.3.6 glmnet_4.1-7
[25] cli_3.6.1 rvest_1.0.3 xml2_1.3.3
[28] labeling_0.4.2 scales_1.2.1 ggridges_0.5.4
[31] callr_3.7.3 rappdirs_0.3.3 systemfonts_1.0.4
[34] pbdZMQ_0.3-9 stringr_1.5.0 digest_0.6.31
[37] StanHeaders_2.21.0-7 extraDistr_1.9.1 base64enc_0.1-3
[40] pkgconfig_2.0.3 htmltools_0.5.5 fastmap_1.1.1
[43] rlang_1.1.0 shape_1.4.6 prophet_1.0
[46] generics_0.1.3 farver_2.1.1 jsonlite_1.8.4
[49] dplyr_1.1.2 zip_2.3.0 inline_0.3.19
[52] RCurl_1.98-1.12 magrittr_2.0.3 loo_2.6.0
[55] patchwork_1.1.2 Matrix_1.5-3 Rcpp_1.0.10
[58] IRkernel_1.3.2 munsell_0.5.0 fansi_1.0.4
[61] reticulate_1.28 lifecycle_1.0.3 stringi_1.7.12
[64] pROC_1.18.0 yaml_2.3.7 pkgbuild_1.4.0
[67] plyr_1.8.8 grid_4.2.3 parallel_4.2.3
[70] crayon_1.5.2 lattice_0.20-45 IRdisplay_1.1
[73] splines_4.2.3 lares_5.2.1 ps_1.7.5
[76] pillar_1.9.0 uuid_1.1-0 codetools_0.2-19
[79] stats4_4.2.3 glue_1.6.2 evaluate_0.20
[82] rpart.plot_3.1.1 RcppParallel_5.1.7 png_0.1-8
[85] vctrs_0.6.2 nloptr_2.0.3 gtable_0.3.3
[88] purrr_1.0.1 tidyr_1.3.0 ggplot2_3.4.2
[91] openxlsx_4.2.5.2 h2o_3.40.0.1 ragg_1.2.5
[94] survival_3.5-3 minpack.lm_1.2-3 tibble_3.2.1
[97] iterators_1.0.14 timechange_0.2.0 here_1.0.1 ```