facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.15k stars 343 forks source link

Docker image #173

Closed xlim7 closed 2 years ago

xlim7 commented 3 years ago

Contributing to FB NextGen MMM R script

Issue

Would it be possible to provide public docker images for different OS that has all the Robyn dependencies installed like rstan and reticulate? I've been struggling to install Robyn on a r-minimal image that uses Alpine Linux.

gufengzhou commented 3 years ago

good suggestion! we'll need to discuss this internally and let you know

DzimitryM commented 3 years ago

From my experience r-minimal and Alpine are hardly compatible with R packages that rely on Linux libraries. Alpine contains the minimum of required libraries that are enough just for base R. I would start from a full Linux/Ubuntu image rather than from the optimized one.

romanumero commented 3 years ago

I've been working on an Ubuntu Docker image but the image fails to find nevergrad. I've attempted following the virtualenv and conda installation approaches as defined in the Robyn documentation. Both approaches return the error below when running the demo.R application. Anyone else experiencing this?

`

Start running 5 trials with 2000 iterations per trial each with TwoPointsDE nevergrad algorithm... Running trial nr. 1 `

Error in robyn_mmm(hyper_collect = InputCollect$hyperparameters, InputCollect = InputCollect, : You must have nevergrad python library installed. Calls: robyn_run -> robyn_mmm Execution halted

gufengzhou commented 3 years ago

We've seen multiple cases where nevergrad won't install because of the Python path is not specified even after use_python(). We've added this to the demo in row 40 to force the python path optionally: Sys.setenv(RETICULATE_PYTHON = "~/Library/r-miniconda/envs/r-reticulate/bin/python3.9"). Could you please try?

romanumero commented 3 years ago

I've tried your suggestion as well as _usepython with no luck. Nevergrad appears to be installed as shown below. It seems the virtual environment is not activated.

### conda_create("r-reticulate") #performed in a previous step

Sys.setenv(RETICULATE_PYTHON = "~/miniconda3/envs/r-reticulate/bin/python3.10")
use_condaenv("r-reticulate")
conda_install("r-reticulate", "nevergrad", pip=TRUE)
Requirement already satisfied: nevergrad in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (0.4.3.post8)
Requirement already satisfied: typing-extensions>=3.6.6 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from nevergrad) (3.10.0.2)
Requirement already satisfied: numpy>=1.15.0 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from nevergrad) (1.21.2)
Requirement already satisfied: bayesian-optimization>=1.2.0 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from nevergrad) (1.2.0)
Requirement already satisfied: cma>=2.6.0 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from nevergrad) (3.1.0)
Requirement already satisfied: scipy>=0.14.0 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from bayesian-optimization>=1.2.0->nevergrad) (1.6.1)
Requirement already satisfied: scikit-learn>=0.18.0 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from bayesian-optimization>=1.2.0->nevergrad) (1.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from scikit-learn>=0.18.0->bayesian-optimization>=1.2.0->nevergrad) (3.0.0)
Requirement already satisfied: joblib>=0.11 in /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages (from scikit-learn>=0.18.0->bayesian-optimization>=1.2.0->nevergrad) (1.1.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[1] "nevergrad"
'window_start' is adapted to the closest date contained in input data: 2016-11-21
'window_end' is adapted to the closest date contained in input data: 2018-08-20
'hyperparameters' are not provided yet. To include them, run robyn_inputs(InputCollect = InputCollect, hyperparameters = ...)
 [1] "facebook_I_alphas"      "facebook_I_gammas"      "facebook_I_thetas"     
 [4] "newsletter_alphas"      "newsletter_gammas"      "newsletter_thetas"     
 [7] "ooh_S_alphas"           "ooh_S_gammas"           "ooh_S_thetas"          
[10] "print_S_alphas"         "print_S_gammas"         "print_S_thetas"        
[13] "search_clicks_P_alphas" "search_clicks_P_gammas" "search_clicks_P_thetas"
[16] "tv_S_alphas"            "tv_S_gammas"            "tv_S_thetas"           
Using robyn object location: /out
Input data has 208 weeks in total: 2015-11-23 to 2019-11-11
Initial model is built on rolling window of 92 weeks: 2016-11-21 to 2018-08-20
Using geometric adstocking with 18 hyperparameters & 10-fold ridge x-validation on 6 cores
>>> Start running 5 trials with 2000 iterations per trial each with TwoPointsDE nevergrad algorithm...
 Running trial nr. 1 

Error in robyn_mmm(hyper_collect = InputCollect$hyperparameters, InputCollect = InputCollect,  : 
  You must have nevergrad python library installed.
Calls: robyn_run -> robyn_mmm
Execution halted
laresbernardo commented 3 years ago

Hi! Let's try to debug this, step by step, and see where the error occurs. I think the problem might come from setting the environment and not resetting the R session.

Can you please share your reticulate::py_config() print? Note that if it prints "NOTE: Python version was forced by RETICULATE_PYTHON" you may be forcing your Python version here: Sys.getenv("RETICULATE_PYTHON"). You may change it to enforce a version or suppress it (Sys.setenv("RETICULATE_PYTHON"="")) to delete that option.

library(reticulate) # Load reticulate package
virtualenv_create("r-reticulate") # Should print: virtualenv: r-reticulate
virtualenv_exists("r-reticulate") # Should print TRUE. If not, reset R session and try again
# Reset your R session with .rs.restartR() or manually
library(reticulate) # Load reticulate package in your new session
virtualenv_exists("r-reticulate") # Should print TRUE
use_virtualenv("r-reticulate", required = TRUE)
system("pip --version") # Check if pip is enabled in your environment
py_install("nevergrad", pip = TRUE)
# Should print something like:
# Using virtual environment '/Users/bernardolares/.virtualenvs/r-reticulate' ...
# Requirement already satisfied: nevergrad in /Users/bernardolares/.virtualenvs/r-reticulate/lib/python3.8/site-packages (0.4.3.post8)
# Requirement already satisfied: numpy>=1.15.0 in /Users/bernardolares/.virtualenvs/r-reticulate/lib/python3.8/site-packages (from nevergrad) (1.21.2)
...

You can check if it's enabled correctly with: py_module_available("nevergrad") We should also now be able to import nevergrad manually with: ng <-import("nevergrad", delay_load = FALSE)

romanumero commented 3 years ago

Thank you for your feedback. I am forcing the python version here Sys.setenv(RETICULATE_PYTHON = "~/miniconda3/envs/r-reticulate/bin/python3.10"). R is unable to find the python executable without this step. Executing reticulate::py_config() returns the output below.

python:         /root/miniconda3/envs/r-reticulate/bin/python3.10
libpython:      /root/miniconda3/envs/r-reticulate/lib/libpython3.10.so
pythonhome:     /root/miniconda3/envs/r-reticulate:/root/miniconda3/envs/r-reticulate
version:        3.10.0 | packaged by conda-forge | (default, Oct  9 2021, 18:14:51) [GCC 9.4.0]
numpy:          /root/miniconda3/envs/r-reticulate/lib/python3.10/site-packages/numpy
numpy_version:  1.21.2

NOTE: Python version was forced by RETICULATE_PYTHON
romanumero commented 3 years ago

I resolve the issue by installing the following libraries apt -y install python3 python3-dev python3-pip python3-venv previous I only installed python3-venv. This can be close. Thanks for your help.

kennychen83 commented 3 years ago

Hi! Let's try to debug this, step by step, and see where the error occurs. I think the problem might come from setting the environment and not resetting the R session.

Can you please share your reticulate::py_config() print? Note that if it prints "NOTE: Python version was forced by RETICULATE_PYTHON" you may be forcing your Python version here: Sys.getenv("RETICULATE_PYTHON"). You may change it to enforce a version or suppress it (Sys.setenv("RETICULATE_PYTHON"="")) to delete that option.

library(reticulate) # Load reticulate package
virtualenv_create("r-reticulate") # Should print: virtualenv: r-reticulate
virtualenv_exists("r-reticulate") # Should print TRUE. If not, reset R session and try again
# Reset your R session with .rs.restartR() or manually
library(reticulate) # Load reticulate package in your new session
virtualenv_exists("r-reticulate") # Should print TRUE
use_virtualenv("r-reticulate", required = TRUE)
system("pip --version") # Check if pip is enabled in your environment
py_install("nevergrad", pip = TRUE)
# Should print something like:
# Using virtual environment '/Users/bernardolares/.virtualenvs/r-reticulate' ...
# Requirement already satisfied: nevergrad in /Users/bernardolares/.virtualenvs/r-reticulate/lib/python3.8/site-packages (0.4.3.post8)
# Requirement already satisfied: numpy>=1.15.0 in /Users/bernardolares/.virtualenvs/r-reticulate/lib/python3.8/site-packages (from nevergrad) (1.21.2)
...

You can check if it's enabled correctly with: py_module_available("nevergrad") We should also now be able to import nevergrad manually with: ng <-import("nevergrad", delay_load = FALSE)

Hi, following your steps to debug I was still not able to enable nevergrad:

Screen Shot 2021-10-13 at 2 22 30 PM

romanumero commented 3 years ago

I can share my Dockerfile if it helps. I'll just need to clean it up a bit.

gufengzhou commented 3 years ago

Hey that'd definitely speed things up. Thanks!

romanumero commented 3 years ago

This can be cleaned up a bit but I was able to get this working. It takes a while to build the base image so I split into two parts.

Dockerfile (my-robyn-image)

FROM ubuntu:focal

ENV TZ=UTC

RUN mkdir /app
COPY install_packages.R /app/install_packages.R

RUN apt update
RUN apt -y install dirmngr gnupg apt-transport-https ca-certificates software-properties-common
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
RUN apt update
RUN apt -y install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 wget python3 python3-dev python3-pip python3-venv libcurl4-openssl-dev libv8-dev r-base
RUN Rscript /app/install_packages.R
RUN Rscript /

install_packages.R

Sys.setenv(DOWNLOAD_STATIC_LIBV8=1)

install.packages("remotes")
install.packages("reticulate")

remotes::install_github("facebookexperimental/Robyn/R")

library(reticulate)
Sys.setenv(RETICULATE_PYTHON = "/usr/bin/python3.8")

virtualenv_create("r-reticulate")
virtualenv_exists("r-reticulate")

app/Dockerfile

FROM my-robyn-image

ENV TEST_ENV=""

RUN mkdir -p /out

COPY app.R /app/app.R

CMD Rscript /app/app.R

app/app.R

Sys.setenv(RETICULATE_PYTHON = "~/.virtualenvs/r-reticulate/bin/python3.8")

library(reticulate)
virtualenv_exists("r-reticulate")
use_virtualenv("r-reticulate", required = TRUE)
py_install("nevergrad", pip = TRUE)

Sys.setenv(RETICULATE_PYTHON = "~/.virtualenvs/r-reticulate/bin/python3.8")

ng <-import("nevergrad", delay_load = FALSE)

library(Robyn) 
#set.seed(123)

## force multicore when using RStudio
Sys.setenv(R_FUTURE_FORK_ENABLE="true")
options(future.fork.enable = TRUE)

## Check simulated dataset or load your own dataset
data("dt_simulated_weekly")

## Check holidays from Prophet
# 59 countries included. If your country is not included, please manually add it.
# Tipp: any events can be added into this table, school break, events etc.
data("dt_prophet_holidays")

robyn_object <- "/out/robyn.RDS"

InputCollect <- robyn_inputs(
  dt_input = dt_simulated_weekly
  ,dt_holidays = dt_prophet_holidays

  ### set variables

  ,date_var = "DATE" # date format must be "2020-01-01"
  ,dep_var = "revenue" # there should be only one dependent variable
  ,dep_var_type = "revenue" # "revenue" or "conversion"

  ,prophet_vars = c("trend", "season", "holiday") # "trend","season", "weekday", "holiday"
  # are provided and case-sensitive. Recommended to at least keep Trend & Holidays
  ,prophet_signs = c("default","default", "default") # c("default", "positive", and "negative").
  # Recommend as default.Must be same length as prophet_vars
  ,prophet_country = "DE"# only one country allowed once. Including national holidays
  # for 59 countries, whose list can be found on our githut guide

  ,context_vars = c("competitor_sales_B", "events") # typically competitors, price &
  # promotion, temperature, unemployment rate etc
  ,context_signs = c("default", "default") # c("default", " positive", and "negative"),
  # control the signs of coefficients for baseline variables

  ,paid_media_vars = c("tv_S", "ooh_S"  ,   "print_S"   ,"facebook_I" ,"search_clicks_P")
  # c("tv_S"    ,"ooh_S",   "print_S"   ,"facebook_I", "facebook_S","search_clicks_P"   ,"search_S")
  # we recommend to use media exposure metrics like impressions, GRP etc for the model.
  # If not applicable, use spend instead
  ,paid_media_signs = c("positive", "positive","positive", "positive", "positive")
  # c("default", "positive", and "negative"). must have same length as paid_media_vars.
  # Controls the signs of coefficients for media variables
  ,paid_media_spends = c("tv_S","ooh_S",    "print_S"   ,"facebook_S", "search_S")
  # spends must have same order and same length as paid_media_vars

  ,organic_vars = c("newsletter")
  ,organic_signs = c("positive") # must have same length as organic_vars

  ,factor_vars = c("events") # specify which variables in context_vars and
  # organic_vars are factorial

  ### set model parameters

  ## set cores for parallel computing
  ,cores = 6 # I am using 6 cores from 8 on my local machine. Use future::availableCores() to find out cores

  ## set rolling window start
  ,window_start = "2016-11-23"
  ,window_end = "2018-08-22"

  ## set model core features
  ,adstock = "geometric" # geometric or weibull. weibull is more flexible, yet has one more
  # parameter and thus takes longer
  ,iterations = 2000  # number of allowed iterations per trial. 2000 is recommended

  ,nevergrad_algo = "TwoPointsDE" # recommended algorithm for Nevergrad, the gradient-free
  # optimisation library https://facebookresearch.github.io/nevergrad/index.html
  ,trials = 5 # number of allowed iterations per trial. 5 is recommended without calibration,
  # 10 with calibration.

  # Time estimation: with geometric adstock, 2000 iterations * 5 trials
  # and 6 cores, it takes less than 1 hour. Weibull takes at least twice as much time.
)

#### 2a-2: Second, define and add hyperparameters

## Guide to setup hyperparameters

## 1. get correct hyperparameter names:
# All variables in paid_media_vars or organic_vars require hyperprameter and will be
# transformed by adstock & saturation.
# Difference between paid_media_vars and organic_vars is that paid_media_vars has spend that
# needs to be specified in paid_media_spends specifically.
# Run hyper_names() to get correct hyperparameter names. all names in hyperparameters must
# equal names from hyper_names(), case sensitive.

## 2. get guidance for setting hyperparameter bounds:
# For geometric adstock, use theta, alpha & gamma. For weibull adstock,
# use shape, scale, alpha, gamma.
# Theta: In geometric adstock, theta is decay rate. guideline for usual media genre:
# TV c(0.3, 0.8), OOH/Print/Radio c(0.1, 0.4), digital c(0, 0.3)
# Shape: In weibull adstock, shape controls the decay shape. Recommended c(0.0001, 2).
# The larger, the more S-shape. The smaller, the more L-shape. Channel-type specific
# values still to be investigated
# Scale: In weibull adstock, scale controls the decay inflexion point. Very conservative
# recommended bounce c(0, 0.1), becausee scale can increase adstocking half-life greaetly.
# Channel-type specific values still to be investigated
# Alpha: In s-curve transformation with hill function, alpha controls the shape between
# exponential and s-shape. Recommended c(0.5, 3). The larger the alpha, the more S-shape.
# The smaller, the more C-shape
# Gamma: In s-curve transformation with hill function, gamma controls the inflexion point.
# Recommended bounce c(0.3, 1). The larger the gamma, the later the inflection point
# in the response curve

# helper plots: set plot to TRUE for transformation examples
plot_adstock(FALSE) # adstock transformation example plot,
# helping you understand geometric/theta and weibull/shape/scale transformation
plot_saturation(FALSE) # s-curve transformation example plot,
# helping you understand hill/alpha/gamma transformatio

## 3. set each hyperparameter bounds. They either contains two values e.g. c(0, 0.5),
# or only one value (in which case you've "fixed" that hyperparameter)

# Run ?hyper_names to check parameter definition
hyper_names(adstock = InputCollect$adstock, all_media = InputCollect$all_media)

hyperparameters <- list(
  facebook_I_alphas = c(0.5, 3) # example bounds for alpha
  ,facebook_I_gammas = c(0.3, 1) # example bounds for gamma
  ,facebook_I_thetas = c(0, 0.3) # example bounds for theta
  #,facebook_I_shapes = c(0.0001, 2) # example bounds for shape
  #,facebook_I_scales = c(0, 0.1) # example bounds for scale

  ,print_S_alphas = c(0.5, 3)
  ,print_S_gammas = c(0.3, 1)
  ,print_S_thetas = c(0.1, 0.4)
  #,print_S_shapes = c(0.0001, 2)
  #,print_S_scales = c(0, 0.1)

  ,tv_S_alphas = c(0.5, 3)
  ,tv_S_gammas = c(0.3, 1)
  ,tv_S_thetas = c(0.3, 0.8)
  #,tv_S_shapes = c(0.0001, 2)
  #,tv_S_scales= c(0, 0.1)

  ,search_clicks_P_alphas = c(0.5, 3)
  ,search_clicks_P_gammas = c(0.3, 1)
  ,search_clicks_P_thetas = c(0, 0.3)
  #,search_clicks_P_shapes = c(0.0001, 2)
  #,search_clicks_P_scales = c(0, 0.1)

  ,ooh_S_alphas = c(0.5, 3)
  ,ooh_S_gammas = c(0.3, 1)
  ,ooh_S_thetas = c(0.1, 0.4)
  #,ooh_S_shapes = c(0.0001, 2)
  #,ooh_S_scales = c(0, 0.1)

  ,newsletter_alphas = c(0.5, 3)
  ,newsletter_gammas = c(0.3, 1)
  ,newsletter_thetas = c(0.1, 0.4)
  #,newsletter_shapes = c(0.0001, 2)
  #,newsletter_scales = c(0, 0.1)
)

#### 2a-3: Third, add hyperparameters into robyn_inputs()

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)

#### 2a-4: Fourth (optional), model calibration / add experimental input

## Guide for calibration source

# 1. We strongly recommend to use experimental and causal results that are considered
# ground truth to calibrate MMM. Usual experiment types are people-based (e.g. Facebook
# conversion lift) and geo-based (e.g. Facebook GeoLift).
# 2. Currently, Robyn only accepts point-estimate as calibration input. For example, if
# 10k$ spend is tested against a hold-out for channel A, then input the incremental
# return as point-estimate as the example below.
# 3. The point-estimate has to always match the spend in the variable. For example, if
# channel A usually has 100k$ weekly spend and the experimental HO is 70%, input the
# point-estimate for the 30k$, not the 70k$.

# dt_calibration <- data.frame(
#   channel = c("facebook_I",  "tv_S", "facebook_I")
#   # channel name must in paid_media_vars
#   , liftStartDate = as.Date(c("2018-05-01", "2017-11-27", "2018-07-01"))
#   # liftStartDate must be within input data range
#   , liftEndDate = as.Date(c("2018-06-10", "2017-12-03", "2018-07-20"))
#   # liftEndDate must be within input data range
#   , liftAbs = c(400000, 300000, 200000) # Provided value must be
#   # tested on same campaign level in model and same metric as dep_var_type
# )
#
# InputCollect <- robyn_inputs(InputCollect = InputCollect
#                              , calibration_input = dt_calibration)

################################################################
#### Step 2b: For known model specification, setup in one single step

## Specify hyperparameters as in 2a-2 and optionally calibration as in 2a-4 and provide them directly in robyn_inputs()

# InputCollect <- robyn_inputs(
#   dt_input = dt_simulated_weekly
#   ,dt_holidays = dt_prophet_holidays
#   ,date_var = "DATE"
#   ,dep_var = "revenue"
#   ,dep_var_type = "revenue"
#   ,prophet_vars = c("trend", "season", "holiday")
#   ,prophet_signs = c("default","default", "default")
#   ,prophet_country = "DE"
#   ,context_vars = c("competitor_sales_B", "events")
#   ,context_signs = c("default", "default")
#   ,paid_media_vars = c("tv_S", "ooh_S",   "print_S", "facebook_I", "search_clicks_P")
#   ,paid_media_signs = c("positive", "positive", "positive", "positive", "positive")
#   ,paid_media_spends = c("tv_S", "ooh_S", "print_S", "facebook_S", "search_S")
#   ,organic_vars = c("newsletter")
#   ,organic_signs = c("positive")
#   ,factor_vars = c("events")
#   ,cores = 6
#   ,window_start = "2016-11-23"
#   ,window_end = "2018-08-22"
#   ,adstock = "geometric"
#   ,iterations = 2000
#   ,trials = 5
#   ,hyperparameters = hyperparameters # as in 2a-2 above
#   #,calibration_input = dt_calibration # as in 2a-4 above
# )

################################################################
#### Step 3: Build initial model

# Run ?robyn_run to check parameter definition
OutputCollect <- robyn_run(
  InputCollect = InputCollect # feed in all model specification
  , plot_folder = robyn_object # plots will be saved in the same folder as robyn_object
  , pareto_fronts = 3
  , plot_pareto = TRUE
  )

## Besides one-pager plots: there are 4 csv output saved in the folder for further usage
# pareto_hyperparameters.csv, hyperparameters per Pareto output model
# pareto_aggregated.csv, aggregated decomposition per independent variable of all Pareto output
# pareto_media_transform_matrix.csv, all media transformation vectors
# pareto_alldecomp_matrix.csv, all decomposition vectors of independent variables

################################################################
#### Step 4: Select and save the initial model

## Compare all model onepagers in the plot folder and select one that mostly represents
## your business reality

OutputCollect$allSolutions # get all model IDs in result
select_model <- "2_8_1" # select one from above
robyn_save(robyn_object = robyn_object # model object location and name
           , select_model = select_model # selected model ID
           , InputCollect = InputCollect # all model input
           , OutputCollect = OutputCollect # all model output
)

################################################################
#### Step 5: Get budget allocation based on the selected model above

## Budget allocator result requires further validation. Please use this result with caution.
## Don't interpret budget allocation result if selected result doesn't meet business expectation

# Check media summary for selected model
OutputCollect$xDecompAgg[solID == select_model & !is.na(mean_spend)
                         , .(rn, coef,mean_spend, mean_response, roi_mean
                             , total_spend, total_response=xDecompAgg, roi_total, solID)]

# Run ?robyn_allocator to check parameter definition
# Run the "max_historical_response" scenario: "What's the revenue lift potential with the
# same historical spend level and what is the spend mix?"
AllocatorCollect <- robyn_allocator(
  InputCollect = InputCollect
  , OutputCollect = OutputCollect
  , select_model = select_model
  , scenario = "max_historical_response"
  , channel_constr_low = c(0.7, 0.7, 0.7, 0.7, 0.7)
  , channel_constr_up = c(1.2, 1.5, 1.5, 1.5, 1.5)
)

# View allocator result. Last column "optmResponseUnitTotalLift" is the total response lift.
AllocatorCollect$dt_optimOut

# Run the "max_response_expected_spend" scenario: "What's the maximum response for a given
# total spend based on historical saturation and what is the spend mix?" "optmSpendShareUnit"
# is the optimum spend share.
AllocatorCollect <- robyn_allocator(
  InputCollect = InputCollect
  , OutputCollect = OutputCollect
  , select_model = select_model
  , scenario = "max_response_expected_spend"
  , channel_constr_low = c(0.7, 0.7, 0.7, 0.7, 0.7)
  , channel_constr_up = c(1.2, 1.5, 1.5, 1.5, 1.5)
  , expected_spend = 1000000 # Total spend to be simulated
  , expected_spend_days = 7 # Duration of expected_spend in days
)

# View allocator result. Column "optmResponseUnitTotal" is the maximum unit (weekly with
# simulated dataset) response. "optmSpendShareUnit" is the optimum spend share.
AllocatorCollect$dt_optimOut

## QA optimal response
# select_media <- "search_clicks_P"
# optimal_spend <- AllocatorCollect$dt_optimOut[channels== select_media, optmSpendUnit]
# optimal_response_allocator <- AllocatorCollect$dt_optimOut[channels== select_media
#                                                            , optmResponseUnit]
# optimal_response <- robyn_response(robyn_object = robyn_object
#                                    , select_build = 0
#                                    , paid_media_var = select_media
#                                    , spend = optimal_spend)
# round(optimal_response_allocator) == round(optimal_response)
# optimal_response_allocator; optimal_response

################################################################
#### Step 6: Model refresh based on selected model and saved Robyn.RDS object - Alpha

## NOTE: must run robyn_save to select and save an initial model first, before refreshing below
## The robyn_refresh() function is suitable for updating within "reasonable periods"
## Two situations are considered better to rebuild model:
## 1, most data is new. If initial model has 100 weeks and 80 weeks new data is added in refresh,
## it might be better to rebuild the model
## 2, new variables are added

# Run ?robyn_refresh to check parameter definition
Robyn <- robyn_refresh(
  robyn_object = robyn_object
  , dt_input = dt_simulated_weekly
  , dt_holidays = dt_prophet_holidays
  , refresh_steps = 13
  , refresh_mode = "auto"
  , refresh_iters = 1000 # Iteration for refresh. 600 is rough estimation. We'll still
  # figuring out what's the ideal number.
  , refresh_trials = 3
)

## Besides plots: there're 4 csv output saved in the folder for further usage
# report_hyperparameters.csv, hyperparameters of all selected model for reporting
# report_aggregated.csv, aggregated decomposition per independent variable
# report_media_transform_matrix.csv, all media transformation vectors
# report_alldecomp_matrix.csv,all decomposition vectors of independent variables

################################################################
#### Step 7: Get budget allocation recommendation based on selected refresh runs

# Run ?robyn_allocator to check parameter definition
AllocatorCollect <- robyn_allocator(
  robyn_object = robyn_object
  , select_build = 3 # Use third refresh model
  , scenario = "max_response_expected_spend"
  , channel_constr_low = c(0.7, 0.7, 0.7, 0.7, 0.7)
  , channel_constr_up = c(1.2, 1.5, 1.5, 1.5, 1.5)
  , expected_spend = 2000000 # Total spend to be simulated
  , expected_spend_days = 14 # Duration of expected_spend in days
)

AllocatorCollect$dt_optimOut

################################################################
#### Step 8: get marginal returns

## Example of how to get marginal ROI of next 1000$ from the 80k spend level for search channel

# Run ?robyn_response to check parameter definition

# Get response for 80k
Spend1 <- 80000
Response1 <- robyn_response(
  robyn_object = robyn_object
  #, select_build = 1 # 2 means the second refresh model. 0 means the initial model
  , paid_media_var = "search_clicks_P"
  , spend = Spend1)
Response1/Spend1 # ROI for search 80k

# Get response for 81k
Spend2 <- Spend1+1000
Response2 <- robyn_response(
  robyn_object = robyn_object
  #, select_build = 1
  , paid_media_var = "search_clicks_P"
  , spend = Spend2)
Response2/Spend2 # ROI for search 81k

# Marginal ROI of next 1000$ from 80k spend level for search
(Response2-Response1)/(Spend2-Spend1)

################################################################
#### Optional: get old model results

# Get old hyperparameters and select model
dt_hyper_fixed <- data.table::fread("~/Desktop/2021-07-29 00.56 init/pareto_hyperparameters.csv")
select_model <- "1_24_5"
dt_hyper_fixed <- dt_hyper_fixed[solID == select_model]

OutputCollectFixed <- robyn_run(
  # InputCollect must be provided by robyn_inputs with same dataset and parameters as before
  InputCollect = InputCollect
  , plot_folder = robyn_object
  , dt_hyper_fixed = dt_hyper_fixed)

# Save Robyn object for further refresh
robyn_save(robyn_object = robyn_object
           , select_model = select_model
           , InputCollect = InputCollect
           , OutputCollect = OutputCollectFixed)
robyn_object <- "/out/robyn.RDS"
Schumzy commented 2 years ago

Hey @romanumero I was just starting out on this exact project. Did you save it after it worked, so the nevergrad installation is working? Are you able to share the docker image? Also - can't tell if you're using base R or Rstudio?

NikolayLutsyak commented 2 years ago

@romanumero thank you for your dockerfiles and guide! While building first Dockerfile I faced with such errors:

libxml-2.0 was not found:

* installing *source* package 'xml2' ...
** package 'xml2' successfully unpacked and MD5 sums checked
** using staged installation
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
Using PKG_CFLAGS=
Using PKG_LIBS=-lxml2
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libxml-2.0 was not found. Try installing:
 * deb: libxml2-dev (Debian, Ubuntu, etc)
 * rpm: libxml2-devel (Fedora, CentOS, RHEL)
 * csw: libxml2_dev (Solaris)
If libxml-2.0 is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libxml-2.0.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package 'xml2'
* removing '/usr/local/lib/R/site-library/xml2'

Because of xml2 rvest can't be installed:

ERROR: dependency 'xml2' is not available for package 'rvest'
* removing '/usr/local/lib/R/site-library/rvest'

Because of rvest lares can't be installed:

ERROR: dependency 'rvest' is not available for package 'lares'
* removing '/usr/local/lib/R/site-library/lares'

Without cmake nloptr can't be installed:

* installing *source* package 'nloptr' ...
** package 'nloptr' successfully unpacked and MD5 sums checked
** using staged installation
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ -std=gnu++14 accepts -g... yes
checking how to run the C++ preprocessor... g++ -std=gnu++14 -E
checking whether we are using the GNU C++ compiler... (cached) yes
checking whether g++ -std=gnu++14 accepts -g... (cached) yes
checking for pkg-config... /usr/bin/pkg-config
checking if pkg-config knows NLopt... no
using NLopt via local cmake build on x86_64 

------------------ CMAKE NOT FOUND --------------------

CMake was not found on the PATH. Please install CMake:

 - yum install cmake          (Fedora/CentOS; inside a terminal)
 - apt install cmake          (Debian/Ubuntu; inside a terminal).
 - pacman -S cmake            (Arch Linux; inside a terminal).
 - brew install cmake         (MacOS; inside a terminal with Homebrew)
 - port install cmake         (MacOS; inside a terminal with MacPorts)

Alternatively install CMake from: <https://cmake.org/>

-------------------------------------------------------

configure: creating ./config.status
config.status: creating src/Makevars
** libs
gcc -I"/usr/share/R/include" -DNDEBUG -I../inst/include  -I'/usr/local/lib/R/site-library/testthat/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-lENDSu/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c init_nloptr.c -o init_nloptr.o
gcc -I"/usr/share/R/include" -DNDEBUG -I../inst/include  -I'/usr/local/lib/R/site-library/testthat/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-lENDSu/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c nloptr.c -o nloptr.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include  -I'/usr/local/lib/R/site-library/testthat/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-lENDSu/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c test-C-API.cpp -o test-C-API.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include  -I'/usr/local/lib/R/site-library/testthat/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-lENDSu/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c test-runner.cpp -o test-runner.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o nloptr.so init_nloptr.o nloptr.o test-C-API.o test-runner.o -llapack -lblas -lgfortran -lm -lquadmath -Lnlopt/lib -lnlopt -L/usr/lib/R/lib -lR
/usr/bin/ld: cannot find -lnlopt
collect2: error: ld returned 1 exit status
make: *** [/usr/share/R/share/make/shlib.mk:10: nloptr.so] Error 1
ERROR: compilation failed for package 'nloptr'
* removing '/usr/local/lib/R/site-library/nloptr'

Finally, Robyn can't be installed:

Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
ERROR: dependencies 'lares', 'nloptr' are not available for package 'Robyn'
* removing '/usr/local/lib/R/site-library/Robyn'
Warning messages:
1: In i.p(...) : installation of package 'xml2' had non-zero exit status
2: In i.p(...) : installation of package 'rvest' had non-zero exit status
3: In i.p(...) : installation of package 'lares' had non-zero exit status
4: In i.p(...) : installation of package 'nloptr' had non-zero exit status
5: In i.p(...) :
  installation of package '/tmp/RtmpzLOW2J/file777253344/Robyn_3.6.1.tar.gz' had non-zero exit status

I managed to fix all that problems by adding cmake libxml2-dev to apt install in first Dockerfile, so now it looks like this:

FROM ubuntu:focal

ENV TZ=UTC

RUN mkdir /app
COPY install_packages.R /app/install_packages.R

RUN apt update
RUN apt -y install dirmngr gnupg apt-transport-https ca-certificates software-properties-common
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
RUN apt update
RUN apt -y install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 wget python3 python3-dev python3-pip python3-venv libcurl4-openssl-dev libv8-dev r-base cmake libxml2-dev
RUN Rscript /app/install_packages.R
RUN Rscript /

Hope, it will help someone, who face same problems

Leonelsentana commented 2 years ago

https://hub.docker.com/layers/leosentana/robyn/latest/images/sha256:fd701fe582d1314d4baae99b6daa664f210b21e310006d003c8c42710ab70382 created a docker updated image for this issue

DzimitryM commented 2 years ago

@Leonelsentana , thank you! Could you please share the dockerfile as well? A short instruction of how to run the container would be also helpful.

Leonelsentana commented 2 years ago

Sharing the dockerfile and other helping docks docker.zip