facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.16k stars 347 forks source link

Error Refreshing Previous Model- ValueError: zero-size array to reduction operation minimum which has no identity #376

Closed lbarta closed 2 years ago

lbarta commented 2 years ago

Project Robyn

Describe issue

Added a model built under previous version of Robyn, using same data set/ fixed hyperparameters, and saved into Robyn object. When I try to run that model through robyn_refresh(), receiving the error:

`>>> Initial model loaded

Refreshing model 1 in AUTO mode. 6 more to go... Input data has 172 weeks in total: 2018-12-31 to 2022-04-11 Refresh model is built on rolling window of 141 week: 2019-01-28 to 2021-10-04 Rolling window moving forward: 4 week Using geometric adstocking with 37 hyperparameters (0 to iterate + 37 fixed) on 1 core (Windows fallback) Starting 3 trials with 1000 iterations each using TwoPointsDE nevergrad algorithm... Running trial 1 of 3 Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: zero-size array to reduction operation minimum which has no identity

Traceback: 7. stop(structure(list(message = "ValueError: zero-size array to reduction operation minimum which has no identity", call = py_call_impl(callable, dots$args, dots$keywords), cppstack = NULL), class = c("Rcpp::exception", "C++Error", "error", "condition"))) 6. amin at <__array_function__ internals>#180 5. ng$p$Array(shape = my_tuple, lower = 0, upper = 1) 4. robyn_mmm(InputCollect = InputCollect, hyper_collect = hyper_collect, iterations = iterations, cores = cores, nevergrad_algo = nevergrad_algo, intercept_sign = intercept_sign, add_penalty_factor = add_penalty_factor, refresh = refresh, seed = seed + ngt, quiet = quiet) 3. robyn_train(InputCollect, hyper_collect = hyps, cores, iterations, trials, intercept_sign, nevergrad_algo, dt_hyper_fixed, add_penalty_factor, refresh, seed, quiet) 2. robyn_run(InputCollect = InputCollectRF, plot_folder = objectPath, plot_folder_sub = plot_folder_sub, calibration_constraint = listOutputPrev[["calibration_constraint"]], add_penalty_factor = listOutputPrev[["add_penalty_factor"]], iterations = refresh_iters, trials = refresh_trials, pareto_fronts = 3, ... 1. robyn_refresh(robyn_object = "OSBModel_BASE.RDS", dt_input = dt_input, dt_holidays = dt_holidays, refresh_steps = 4, refresh_mode = "AUTO", refresh_iters = 1000, refresh_trials = 3, clusters = TRUE)`

Appears to be an issue with hyperparameters being passed to nevergrad

Provide dummy data & model configuration

Model rebuild using original data set included, refresh using refresh data set. Hyperparameters and code also included

osb_input_original_scaled.csv osb_input_refresh_scaled.csv pareto_hyperparameters.csv robyn_refresh_error_code.txt

Environment & Robyn version

R version 4.1.2 Robyn version 3.6.2 all other packages up to date

kyletgoldberg commented 2 years ago

can you try loading your robyn_object with readRDS before running robyn_refresh? so it would look like -

robyn_object <- readRDS("C:\Users\lbarta\RStudio\RData\v2\OSBModel_BASE.RDS")

and then just having the robyn_refresh function call look like -

Robyn <- robyn_refresh(
  robyn_object = robyn_object
  , dt_input = dt_input
  , dt_holidays = dt_holidays
  , refresh_steps = 4
  , refresh_mode = "AUTO"
  , refresh_iters = 1000 
  , refresh_trials = 3
  , clusters = TRUE
)
kyletgoldberg commented 2 years ago

Hi - were you able to resolve this issue?

sziolko commented 2 years ago

@kyletgoldberg I see the same issue.

When running with the readRDS function as you describe:

> robyn_object <- readRDS("RobynCDF.RDS")
> Robyn <- robyn_refresh(
  robyn_object = robyn_object
  , dt_input = data
  , dt_holidays = dt_prophet_holidays
  , refresh_steps = 7
  , refresh_mode = "auto"
  , refresh_iters = 2000 
  , refresh_trials = 5
  , plot_pareto = FALSE
  , clusters = FALSE
)

There is an immediate error of: Error in file.exists(robyn_object) : invalid 'file' argument as it is looking for the file path for the object, not an already loaded robyn_object.

When loading as described in the demo.R file https://github.com/facebookexperimental/Robyn/blob/main/demo/demo.R#L531-L549 (note there is an error in that the select_model argument needs to be provided to the export function) the error I see is this (present in both Robyn 3.6 & 3.7)

7: stop(structure(list(message = "ValueError: zero-size array to reduction operation minimum which has no identity\n",
       call = py_call_impl(callable, dots$args, dots$keywords),
       cppstack = structure(list(file = "", line = -1L, stack = c("/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/reticulate/libs/reticulate.so(Rcpp::exception::exception(char const*, bool)+0x9a) [0x7fe4e19f232a]",
       "/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/reticulate/libs/reticulate.so(Rcpp::stop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x29) [0x7fe4e19e1f3e]",
       "/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/reticulate/libs/reticulate.so(+0x1ea4f) [0x7fe4e19e5a4f]",
       "/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/reticulate/libs/reticulate.so(_reticulate_py_call_impl+0xca) [0x7fe4e19ed1fa]",
       "/usr/lib/R/lib/libR.so(+0xf7b50) [0x7fe4e6108b50]", "/usr/lib/R/lib/libR.so(+0x1388be) [0x7fe4e61498be]",
       "/usr/lib/R/lib/libR.so(Rf_eval+0x180) [0x7fe4e615f8f0]",
       "/usr/lib/R/lib/libR.so(+0x15048f) [0x7fe4e616148f]", "/usr/lib/R/lib/libR.so(Rf_applyClosure+0x1a5) [0x7fe4e61622d5]",
       "/usr/lib/R/lib/libR.so(+0x13ea30) [0x7fe4e614fa30]", "/usr/lib/R/lib/libR.so(Rf_eval+0x180) [0x7fe4e615f8f0]",
       "/usr/lib/R/lib/libR.so(+0x15048f) [0x7fe4e616148f]", "/usr/lib/R/lib/libR.so(Rf_applyClosure+0x1a5) [0x7fe4e61622d5]",
       "/usr/lib/R/lib/libR.so(+0x13ea30) [0x7fe4e614fa30]", "/usr/lib/R/lib/libR.so(Rf_eval+0x180) [0x7fe4e615f8f0]",
       "/usr/lib/R/lib/libR.so(+0x15048f) [0x7fe4e616148f]", "/usr/lib/R/lib/libR.so(Rf_applyClosure+0x1a5) [0x7fe4e61622d5]",
       "/usr/lib/R/lib/libR.so(+0x13ea30) [0x7fe4e614fa30]", "/usr/lib/R/lib/libR.so(Rf_eval+0x180) [0x7fe4e615f8f0]",
       "/usr/lib/R/lib/libR.so(+0x15048f) [0x7fe4e616148f]", "/usr/lib/R/lib/libR.so(Rf_applyClosure+0x1a5) [0x7fe4e61622d5]",
       "/usr/lib/R/lib/libR.so(+0x13ea30) [0x7fe4e614fa30]", "/usr/lib/R/lib/libR.so(Rf_eval+0x180) [0x7fe4e615f8f0]",
       "/usr/lib/R/lib/libR.so(+0x15048f) [0x7fe4e616148f]", "/usr/lib/R/lib/libR.so(Rf_applyClosure+0x1a5) [0x7fe4e61622d5]",
       "/usr/lib/R/lib/libR.so(+0x13ea30) [0x7fe4e614fa30]", "/usr/lib/R/lib/libR.so(Rf_eval+0x180) [0x7fe4e615f8f0]",
       "/usr/lib/R/lib/libR.so(+0x15048f) [0x7fe4e616148f]", "/usr/lib/R/lib/libR.so(Rf_applyClosure+0x1a5) [0x7fe4e61622d5]",
       "/usr/lib/R/lib/libR.so(Rf_eval+0x2ac) [0x7fe4e615fa1c]",
       "/usr/lib/R/lib/libR.so(+0x152fa2) [0x7fe4e6163fa2]", "/usr/lib/R/lib/libR.so(Rf_eval+0x580) [0x7fe4e615fcf0]",
       "/usr/lib/R/lib/libR.so(Rf_ReplIteration+0x20a) [0x7fe4e61940ea]",
       "/usr/lib/R/lib/libR.so(+0x183480) [0x7fe4e6194480]", "/usr/lib/R/lib/libR.so(run_Rmainloop+0x50) [0x7fe4e6194540]",
       "/usr/lib/R/bin/exec/R(main+0x1f) [0x558c3ab290af]", "/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe4e5e12d90]",
       "/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fe4e5e12e40]",
       "/usr/lib/R/bin/exec/R(_start+0x25) [0x558c3ab290e5]")), class = "Rcpp_stack_trace")), class = c("Rcpp::exception",
   "C++Error", "error", "condition")))
6: py_call_impl(callable, dots$args, dots$keywords)
5: ng$p$Array(shape = my_tuple, lower = 0, upper = 1)
4: robyn_mmm(InputCollect = InputCollect, hyper_collect = hyper_collect,
       iterations = iterations, cores = cores, nevergrad_algo = nevergrad_algo,
       intercept_sign = intercept_sign, add_penalty_factor = add_penalty_factor,
       refresh = refresh, seed = seed + ngt, quiet = quiet)
3: robyn_train(InputCollect, hyper_collect = hyps, cores, iterations,
       trials, intercept_sign, nevergrad_algo, dt_hyper_fixed, add_penalty_factor,
       refresh, seed, quiet)
2: robyn_run(InputCollect = InputCollectRF, plot_folder = objectPath,
       plot_folder_sub = plot_folder_sub, calibration_constraint = listOutputPrev[["calibration_constraint"]],
       add_penalty_factor = listOutputPrev[["add_penalty_factor"]],
       iterations = refresh_iters, trials = refresh_trials, pareto_fronts = 3,
       refresh = TRUE, plot_pareto = plot_pareto, ...)
1: robyn_refresh(robyn_object = robyn_object, dt_input = data, dt_holidays = dt_prophet_holidays,
       refresh_steps = 7, refresh_mode = "auto", refresh_iters = 2000,
       refresh_trials = 5, plot_pareto = FALSE, clusters = FALSE)
nie-moviestarplanet commented 2 years ago

I'm having the same issue. Running on version 3.7.0.

>>> Loaded Model: Initial model
>>> Building refresh model #1 in manual mode
>>> New bounds freedom: 1.92%
Input data has 577 days in total: 2021-01-01 to 2022-07-31
Refresh model is built on rolling window of 208 day: 2022-01-05 to 2022-07-31
Rolling window moving forward: 4 days
Using geometric adstocking with 19 hyperparameters (0 to iterate + 19 fixed) on 1 core (Windows fallback)
>>> Starting 1 trials with 100 iterations each using TwoPointsDE nevergrad algorithm...
  Running trial 1 of 1
 Error in py_call_impl(callable, dots$args, dots$keywords) : 
ValueError: zero-size array to reduction operation minimum which has no identity

The robyn_object being used is the result of loading an old model

dt_hyper_fixed <- read.csv("pareto_hyperparameters.csv")
select_model <- "3_221_3"
dt_hyper_fixed <- dt_hyper_fixed[dt_hyper_fixed$solID == select_model, ]

#View(dt_hyper_fixed)

OutputCollectFixed <- robyn_run(
  # InputCollect must be provided by robyn_inputs with same dataset and parameters as before
  InputCollect = InputCollect,
  plot_folder = robyn_object,
  dt_hyper_fixed = dt_hyper_fixed
)

# Save Robyn object for further refresh
robyn_save(
  robyn_object = robyn_object,
  select_model = select_model,
  InputCollect = InputCollect,
  OutputCollect = OutputCollectFixed
)
laresbernardo commented 2 years ago

Error in file.exists(robyn_object) : invalid 'file' argument as it is looking for the file path for the object, not an already loaded robyn_object.

Try passing robyn_object = "RobynCDF.RDS" instead of the object. In 3.7.0, both should work.

(...) there is an error in that the select_model argument needs to be provided to the export function)

If you're in 3.7.0 (as intended in the demo.R file) you don't need to specify the select_model given it's redundant and the RDS already has that information, so the ID will be picked up automatically when re-generating a model.

Can any of you two share a reproducible example so we can debug this issue? If the RDS file is too large, feel free to use any online sharing link and add it there as a comment. Thanks!

sziolko commented 2 years ago

@laresbernardo I will try to make the time to test it with the demo data. But my process that is failing is essentially.

  1. Run demo.R
  2. Close R
  3. Decide that you want a different model from that run.
  4. Open R run up through step 4 in the demo script (so that you have the data, InputCollect etc in your R session).
  5. Run:
    
    dt_hyper_fixed <- read.csv("pareto_hyperparameters.csv")
    select_model <- "3_221_3"
    dt_hyper_fixed <- dt_hyper_fixed[dt_hyper_fixed$solID == select_model, ]

OutputCollectFixed <- robyn_run(

InputCollect must be provided by robyn_inputs with same dataset and parameters as before

InputCollect = InputCollect, plot_folder = robyn_object, dt_hyper_fixed = dt_hyper_fixed )

Save Robyn object for further refresh

robyn_save( robyn_object = robyn_object, select_model = select_model, InputCollect = InputCollect, OutputCollect = OutputCollectFixed )

Robyn <- robyn_refresh( robyn_object = robyn_object , dt_input = data , dt_holidays = dt_prophet_holidays , refresh_steps = 7 , refresh_mode = "auto" , refresh_iters = 2000 , refresh_trials = 5 , plot_pareto = FALSE , clusters = FALSE )

laresbernardo commented 2 years ago

Can you please share you pareto_hyperparameters.csv file? Using the same demo.R InputCollect right?

sziolko commented 2 years ago

@laresbernardo this replicates the error on my machine (ubuntu) using the demo data.

library('doParallel')
library('remotes')
remotes::install_github("facebookexperimental/Robyn/R")
library('reticulate')
use_virtualenv("r-reticulate", required = TRUE)
library('Robyn')

data("dt_simulated_weekly")
head(dt_simulated_weekly)

## Check holidays from Prophet
# 59 countries included. If your country is not included, please manually add it.
# Tipp: any events can be added into this table, school break, events etc.
data("dt_prophet_holidays")
head(dt_prophet_holidays)

## Set robyn_object. It must have extension .RDS. The object name can be different than Robyn:
robyn_object <- "DemoMyRobyn.RDS"

################################################################
#### Step 2a: For first time user: Model specification in 4 steps

#### 2a-1: First, specify input variables

## -------------------------------- NOTE v3.6.0 CHANGE !!! ---------------------------------- ##
## All sign control are now automatically provided: "positive" for media & organic variables
## and "default" for all others. User can still customise signs if necessary. Documentation
## is available in ?robyn_inputs
## ------------------------------------------------------------------------------------------ ##
InputCollect <- robyn_inputs(
  dt_input = dt_simulated_weekly,
  dt_holidays = dt_prophet_holidays,
  date_var = "DATE", # date format must be "2020-01-01"
  dep_var = "revenue", # there should be only one dependent variable
  dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA)
  prophet_vars = c("trend", "season", "holiday"), # "trend","season", "weekday" & "holiday"
  prophet_country = "DE", # input one country. dt_prophet_holidays includes 59 countries by default
  context_vars = c("competitor_sales_B", "events"), # e.g. competitors, discount, unemployment etc
  paid_media_spends = c("tv_S", "ooh_S", "print_S", "facebook_S", "search_S"), # mandatory input
  paid_media_vars = c("tv_S", "ooh_S", "print_S", "facebook_I", "search_clicks_P"), # mandatory.
  # paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like
  # impressions, GRP etc. If not applicable, use spend instead.
  organic_vars = c("newsletter"), # marketing activity without media spend
  factor_vars = c("events"), # specify which variables in context_vars or organic_vars are factorial
  window_start = "2016-11-23",
  window_end = "2018-08-22",
  adstock = "weibull_cdf" # geometric, weibull_cdf or weibull_pdf.
)
print(InputCollect)

#### 2a-2: Second, define and add hyperparameters

## -------------------------------- NOTE v3.6.0 CHANGE !!! ---------------------------------- ##
## Default media variable for modelling has changed from paid_media_vars to paid_media_spends.
## hyperparameter names needs to be base on paid_media_spends names. Run:
hyper_names(adstock = InputCollect$adstock, all_media = InputCollect$all_media)
## to see correct hyperparameter names. Check GitHub homepage for background of change.
## Also calibration_input are required to be spend names.
## ------------------------------------------------------------------------------------------ ##

# Run hyper_limits() to check maximum upper and lower bounds by range
# Example hyperparameters ranges for Geometric adstock
hyperparameters <- list(
  facebook_S_alphas = c(0.5, 3),
  facebook_S_gammas = c(0.3, 1),
  facebook_S_shapes = c(0.0001, 2),
  facebook_S_scales = c(0, 0.1),

  print_S_alphas = c(0.5, 3),
  print_S_gammas = c(0.3, 1),
  print_S_shapes = c(0.0001, 2),
  print_S_scales = c(0, 0.1),

  tv_S_alphas = c(0.5, 3),
  tv_S_gammas = c(0.3, 1),
  tv_S_shapes = c(0.0001, 2),
  tv_S_scales = c(0, 0.1),

  search_S_alphas = c(0.5, 3),
  search_S_gammas = c(0.3, 1),
  search_S_shapes = c(0.0001, 2),
  search_S_scales = c(0, 0.1),

  ooh_S_alphas = c(0.5, 3),
  ooh_S_gammas = c(0.3, 1),
  ooh_S_shapes = c(0.0001, 2),
  ooh_S_scales = c(0, 0.1),

  newsletter_alphas = c(0.5, 3),
  newsletter_gammas = c(0.3, 1),
  newsletter_shapes = c(0.0001, 2),
  newsletter_scales = c(0, 0.1)

)

#### 2a-3: Third, add hyperparameters into robyn_inputs()

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
print(InputCollect)

################################################################
#### Step 3: Build initial model

## Run all trials and iterations. Use ?robyn_run to check parameter definition
OutputModels <- robyn_run(
  InputCollect = InputCollect, # feed in all model specification
  # cores = NULL, # default to max available
  # add_penalty_factor = FALSE, # Untested feature. Use with caution.
  iterations = 2000, # recommended for the dummy dataset
  trials = 5, # recommended for the dummy dataset
  outputs = FALSE # outputs = FALSE disables direct model output - robyn_outputs()
)
print(OutputModels)

## Calculate Pareto optimality, cluster and export results and plots. See ?robyn_outputs
OutputCollect <- robyn_outputs(
  InputCollect, OutputModels,
  pareto_fronts = 3,
  # calibration_constraint = 0.1, # range c(0.01, 0.1) & default at 0.1
  csv_out = "pareto", # "pareto" or "all"
  clusters = FALSE, # Set to TRUE to cluster similar models by ROAS. See ?robyn_clusters
  plot_pareto = FALSE, # Set to FALSE to deactivate plotting and saving model one-pagers
  plot_folder = robyn_object # path for plots export
)
print(OutputCollect)

################################################################
#### Step 4: Select and save the initial model

## Compare all model one-pagers and select one that mostly reflects your business reality
## for this demo pick the first model
models <- unique(names(OutputCollect$allPareto$plotDataCollect))
select_model <- models[1]

#select_model <- "1_29_11" # select one from above
ExportedModel <- robyn_save(
  robyn_object = robyn_object, # model object location and name
  select_model = select_model, # selected model ID
  InputCollect = InputCollect,
  OutputCollect = OutputCollect
)
print(ExportedModel)
# plot(ExportedModel)

###########################################################################
#### pretend that time has passed, but you want to refresh from
#### a different model. So you re-run everything through 2a-3
#### but need to pull the outputs from the pareto_hyperparameters.csv file
#### because R crashed before writing out the RDS file, or you 
#### overwrote the RDS file with another run etc (it isn't available)
###########################################################################
#### get old model results
a_different_model <- models[2]
# Get old hyperparameters and select model
dt_hyper_fixed <- read.csv("./2022-08-01 23.42 init/pareto_hyperparameters.csv")
select_model <- a_different_model
dt_hyper_fixed <- dt_hyper_fixed[dt_hyper_fixed$solID == select_model, ]

OutputCollectFixed <- robyn_run(
  # InputCollect must be provided by robyn_inputs with same dataset and parameters as before
  InputCollect = InputCollect,
  plot_folder = robyn_object,
  dt_hyper_fixed = dt_hyper_fixed
)

# Save Robyn object for further refresh
ExportedModel <- robyn_save(
  robyn_object = robyn_object,
  InputCollect = InputCollect,
  OutputCollect = OutputCollectFixed
)

Robyn <- robyn_refresh(
  robyn_object = robyn_object,
  dt_input = dt_simulated_weekly,
  dt_holidays = dt_prophet_holidays,
  refresh_steps = 4,
  refresh_mode = "manual",
  refresh_iters = 100, # 1k is estimation. Use refresh_mode = "manual" to try out.
  refresh_trials = 1
)
laresbernardo commented 2 years ago

I was able to reproduce the issue. Thanks for your example @sziolko Will get back to you once I fix it

laresbernardo commented 2 years ago

What's happening here is that we don't have the hyper-parameter bounds for each of those fixed hyper-parameters. Let me think of a way to deal with these cases and will update this ticket. CC: @gufengzhou

laresbernardo commented 2 years ago

@sziolko with the new method of exporting Robyn models to recreate them, this issue is fixed. Can you please update to the latest dev version and retry? You can use the demo.R file to see how it works, especially the last section.

gufengzhou commented 2 years ago

Please reopen if this issue reoccurs.

ChrisHaferl commented 1 year ago

Getting the same issue, opened a new one here https://github.com/facebookexperimental/Robyn/issues/761