facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.14k stars 337 forks source link

Scheduled cores Error ", all values of jobs will be affected" #576

Closed MustafaCelen closed 1 year ago

MustafaCelen commented 1 year ago

In mclapply(argsList, FUN, mc.preschedule = preschedule, mc.set.seed = set.seed, : scheduled cores 1, 2, 3, 4 did not deliver results, all values of the jobs will be affected

I got the above error at the end of runnning robyn_outputs() function , it returns all the one pager reports etc. however the error saying values will be effected is not nice, i am not sure if this refers to the actual values on my plots and they are actually affected.

I cant share my data but the code so far is down below ,i tried assigning only one core in robyn outputs function however it didnt work, still tries to use all the cores , is there a way to specify number of cores during robyn_outputs()

colnames(df_weekly)[25] = "OtelEndeks"
InputCollect <- robyn_inputs(
  dt_input = df_weekly,
  #dt_holidays = dt_prophet_holidays,
  date_var = "index", # date format must be "2020-01-01"
  dep_var = "sales_amount", # there should be only one dependent variable
  dep_var_type = "conversion", # "revenue" (ROI) or "conversion" (CPA)
  prophet_vars = c("season","trend"), # "trend","season", "weekday" & "holiday"
  prophet_country = "DE", # input one country. dt_prophet_holidays includes 59 countries by default
  context_vars = c("OtelEndeks"), # e.g. competitors, discount, unemployment etc
  paid_media_spends = c("tv_cost", "sm_cost", "gads_video_cost", "gads_display_cost", "gads_search_cost","dv360_video_cost","dv360_display_cost","cm360_video_cost","cm360_display_cost"), # mandatory input
  paid_media_vars = c("tv_impression", "sm_impression", "gads_video_impression", "gads_display_impression", "gads_search_impression","dv360_video_impression","dv360_display_impression","cm360_video_impression","cm360_display_impression"), 
  adstock = "weibull_pdf", # geometric, weibull_cdf or weibull_pdf.
)
print(InputCollect)

plot_adstock(plot = FALSE)
plot_saturation(plot = FALSE)

hyperparameters <- list(
  gads_search_cost_alphas = c(0.5, 3),
  gads_search_cost_gammas = c(0.3, 1),
  gads_search_cost_shapes = c(0.0001, 10),
  gads_search_cost_scales = c(0, 0.1),
  sm_cost_alphas = c(0.5, 3),
  sm_cost_gammas = c(0.3, 1),
  sm_cost_shapes = c(0.0001, 10),
  sm_cost_scales = c(0, 0.1),
  tv_cost_alphas = c(0.5, 3),
  tv_cost_gammas = c(0.3, 1),
  tv_cost_shapes = c(0.0001, 10),
  tv_cost_scales = c(0, 0.1),
  #total_video_cost_alphas = c(0.5, 3),
  #total_video_cost_gammas = c(0.3, 1),
  #total_video_cost_shapes = c(0.0001, 10),
  #total_video_cost_scales = c(0, 0.1),
  #total_display_cost_alphas = c(0.5, 3),
  #total_display_cost_gammas = c(0.3, 1),
  #total_display_cost_shapes = c(0.0001, 10),
  #total_display_cost_scales = c(0, 0.1),
  gads_display_cost_alphas = c(0.5, 3),
  gads_display_cost_gammas = c(0.3, 1),
  gads_display_cost_shapes = c(0.0001, 10),
  gads_display_cost_scales = c(0, 0.1),
  gads_video_cost_alphas = c(0.5, 3),
  gads_video_cost_gammas = c(0.3, 1),
  gads_video_cost_shapes = c(0.0001, 10),
  gads_video_cost_scales = c(0, 0.1),
  dv360_display_cost_alphas = c(0.5, 3),
  dv360_display_cost_gammas = c(0.3, 1),
  dv360_display_cost_shapes = c(0.0001, 10),
  dv360_display_cost_scales = c(0, 0.1),
  dv360_video_cost_alphas = c(0.5, 3),
  dv360_video_cost_gammas = c(0.3, 1),
  dv360_video_cost_shapes = c(0.0001, 10),
  dv360_video_cost_scales = c(0, 0.1),
  cm360_display_cost_alphas = c(0.5, 3),
  cm360_display_cost_gammas = c(0.3, 1),
  cm360_display_cost_shapes = c(0.0001, 10),
  cm360_display_cost_scales = c(0, 0.1),
  cm360_video_cost_alphas = c(0.5, 3),
  cm360_video_cost_gammas = c(0.3, 1),
  cm360_video_cost_shapes = c(0.0001, 10),
  cm360_video_cost_scales = c(0, 0.1),
  train_size = c(0.5, 0.8)

)

OutputModels <- robyn_run(
  InputCollect = InputCollect, # feed in all model specification
  cores = 12, # NULL defaults to max available - 1
  ts_validation = TRUE,
  iterations = 6000, # 2000 recommended for the dummy dataset with no calibration
  trials = 8, # 5 recommended for the dummy dataset
  #add_penalty_factor = FALSE, # Experimental feature. Use with caution.
  outputs = FALSE # outputs = FALSE disables direct model output - robyn_outputs()
)

print(OutputModels)

OutputModels$convergence$moo_distrb_plot
OutputModels$convergence$moo_cloud_plot

OutputCollect <- robyn_outputs(
  InputCollect, OutputModels,
  csv_out = "pareto", # "pareto", "all", or NULL (for none)
  clusters = TRUE, # Set to TRUE to cluster similar models by ROAS. See ?robyn_clusters
  plot_pareto = TRUE, # Set to FALSE to deactivate plotting and saving model one-pagers
  plot_folder = robyn_object, # path for plots export
  ts_validation = TRUE
  #export = TRUE # this will create files locally
)
print(OutputCollect)
MustafaCelen commented 1 year ago
library(foreach)
doFuture::registerDoFuture()
library(doRNG)

with this lines of codes , the warning does not appear anymore

laresbernardo commented 1 year ago

Hi @MustafaCelen thanks for sharing.

(...) is there a way to specify number of cores during robyn_outputs()? Yes. You could change the cores value in OutputModels manually: OutputModels$cores <- X.

We are not using doFuture's registerDoFuture but doParallel's registerDoParallel though, which is triggered here.

MustafaCelen commented 1 year ago

I am still getting the error when I am trying to get many one pager reports, I dont use clusters and try to get all of the pareto front results(usually between 100-200), i think it is better for me to analyse the output csv and decide which ones to create then i ll just use the robyn_onepagers func to create them. But I think there is a problem with the memory usage , even with 144gb and 72core cpu its taking so long for outputs.

gufengzhou commented 1 year ago

Are you saying you're outputting 100-200 pareto_fronts? That's too much. If you turn off clustering, try starting with few fronts, like 3 or so. If you have lots of iterations, 3 fronts will already give you quite some candidates and these are the best

MustafaCelen commented 1 year ago

No , but i dont cluster my pareto front results , to clarify you can see how i use it. I am trying 10k iterations with 10 trials, i have 150 weeks of data with 8 media and 2-3 context vars. Do you think I should increase my iterations or trials or is this enough?

OutputCollect <- robyn_outputs( InputCollect, OutputModels,

pareto_fronts = 1, # automatically pick how many pareto-fronts to fill min_candidates

min_candidates = 100, # top pareto models for clustering. Default to 100

calibration_constraint = 0.1, # range c(0.01, 0.1) & default at 0.1

csv_out = "pareto", # "pareto", "all", or NULL (for none) clusters = FALSE, # Set to TRUE to cluster similar models by ROAS. See ?robyn_clusters plot_pareto = TRUE, # Set to FALSE to deactivate plotting and saving model one-pagers plot_folder = robyn_object, # path for plots export

ts_validation = TRUE

export = TRUE # this will create files locally )

gufengzhou commented 1 year ago

Old ticket. Please reopen if necessary.