facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.07k stars 322 forks source link

Demo python notebook crashes when using calibration. #915

Closed pvizan-artefact closed 5 months ago

pvizan-artefact commented 5 months ago

Project Robyn

Describe issue

I cannot get the demo notebook to complete execution when using calibration. For context, I am able to run the notebook top to bottom and generate all the outputs. (just ommitting the lines of code meant for different OS than mine, and changing the variable select_model as well as other references to models ID: I change the variable to the first solID in OutputCollect['clusters']['models'])

However, if I try to rerun this successfully completely executed notebook after having uncommented the cells of code under step "2-4: Fourth (optional), model calibration / add experimental input", my code stops after running the following code:

outputsArgs = {
    "pareto_fronts" : 'auto', # automatically pick how many pareto-fronts to fill min_candidates (100)
#     "min_candidates" : 100, # top pareto models for clustering. Default to 100
#     "calibration_constraint" : 0.1, # range [0.01, 0.1] & default at 0.1
    "csv_out" : "pareto", # "pareto", "all", or NULL (for none)
    "clusters" : True, # Set to TRUE to cluster similar models by ROAS.
    "export" : create_files, # this will create files locally
    "plot_folder" : robyn_directory, # path for plots exports and files creation
    "plot_pareto" : create_files # Set to FALSE to deactivate plotting and saving model one-pagers
}

# Build the payload for the robyn_outputs()
payload = {
    'InputCollect' : json.dumps(InputCollect),
    'OutputModels' : json.dumps(OutputModels),
    'jsonOutputsArgs' : json.dumps(outputsArgs)
}
# Get response
OutputCollect = robyn_api('robyn_outputs',payload=payload)

At this point (after the API request), I get an error that comes from the Rproject subprocess that is initiated at the top of the notebook, that states the following:

<simpleError in 1:pareto_fronts: argument of length 0>
Warning in check_calibconstr(calibration_constraint, OutputModels$iterations,  :
  Input 'calibration_constraint' set for top 10% calibrated models. 100 models left for pareto-optimal selection. Minimum suggested: 500
>>> Running Pareto calculations for 1000 models on auto fronts...

The code does work if the pareto_fronts variable is changed to another value like 18, for example. But I am not sure what is a good value that replaces "auto" that is not "auto". Furthermore, I would like to know if this issue happens to other people as well.

Information about device

Software:

    System Software Overview:

      System Version: macOS 13.3 (22E252)
      Kernel Version: Darwin 22.4.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: ****
      User Name: ****
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 19 days, 21 hours, 7 minutes

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: Mac14,7
      Model Number: MNEH3N/A
      Chip: Apple M2
      Total Number of Cores: 8 (4 performance and 4 efficiency)
      Memory: 8 GB
      System Firmware Version: 8422.100.650
      OS Loader Version: 8422.100.650
      Serial Number (system): ****
      Hardware UUID: ****
      Provisioning UDID: ****
      Activation Lock Status: Disabled

Provide reproducible example

You can reproduce the error by applying the following changes on the demo notebook:

Environment & Robyn version

Robyn version: 3.10.5.9012 R version 4.3.2 (2023-10-31) -- "Eye Holes"

yu-ya-tanaka commented 5 months ago

Hi @pvizan-artefact, This error is caused by the small number of models created by robin_run. In Step 3, can you increase the number of iterations or trials (for example, iterations=2000, trials=5, etc.) and try again to see if the error can be corrected?

pvizan-artefact commented 5 months ago

@yu-ya-tanaka the suggested change does indeed solve the error (changing iterations from 200 to 2000). Only doubt I still have is why decreasing the number of iterations causes this error? How does it change the value of pareto_fronts and makes the code crash? Is there a rule of thumb for the number of iterations needed for the pareto_fronts = "auto" to work? Thanks a lot!

yu-ya-tanaka commented 5 months ago

@pvizan-artefact If pareto_fronts=“auto", 100 models are selected. In your case, I think you got error with the following flow.

  1. Created 1000 models (200 iterations and 5 trials) with calibration.
  2. 100 models was selected from 1000 models by setting calibration_constraint = default (0.1).
  3. Robyn tried to select 100 models with setting pareto_fronts = “auto”(100) from 100 models and got error.

Regarding iterations size, 2000 is recommended for the dummy dataset with no calibration. We recommend to run at least 2000 iterations per trial and 10 trials to build initial model with calibration. But this is a general guide and you need to consider what iterations and trials is best based on your input data with checking convergence, model performance, and match with business sense.