AnotherSamWilson / ParBayesianOptimization

Parallelizable Bayesian Optimization in R
107 stars 18 forks source link

Setting up BayesianOptimization #4

Closed twesleyb closed 5 years ago

twesleyb commented 5 years ago

Problem

Hi Sam, I'm having trouble setting up your BayesianOptimization function, maybe you could help me out? I'd really appreciate any help! I can provide more info if I've left anything out.

Background

My function (score_network) is a wrapper around blockwiseModules from the WGCNA package. It takes a gene expression matrix (data) as well as a bunch of parameters (hyperparameters), and returns a weighted gene co-expression network (net) and its partition into modules (clusters/colors) of genes. The clustering result is influenced by the function's parameters.

Below is a reproducible example:

library(WGCNA)
options(stringsAsFactors = FALSE)

# Generate some test data.
set.seed(2)
nGenes <- 1000
nSamples <- 8
data <- matrix(rnorm(n = nSamples * nGenes, mean = 100), nSamples, nGenes)

# WGCNA function to be optimized:
score_network<- function(data){
  net <- blockwiseModules(datExpr = data,
                          # Hyperparameters:
                          deepSplit              = deepSplit,       
                          minModuleSize    = minModuleSize,   
                          mergeCutHeight  = mergeCutHeight,  
                          reassignThresh     = mergeCutHeight,
                          minCoreKMESize = minCoreKMESize, 
                          minKMEtoStay     = minKMEtoStay,
                          # Defaults:
                          power                       = 5,
                          TOMDenom              = "mean",
                          detectCutHeight       = 0.995,
                          corType                     = "bicor",
                          networkType             = "signed",
                          pamStage                  = TRUE,
                          pamRespectsDendro = TRUE,
                          saveTOMs                  = FALSE,
                          maxBlockSize             = 12000,
                          verbose                      = 0)
  # Calculate and return median module coherence. 
  pve <- propVarExplained(datExpr = data, colors = net$colors, MEs = net$MEs)
  return(list(Score = median(pve)))
}

The function returns a list (net) which contains information about the network. In order to evaluate an overall quality metric, I can calculate the median percent variance explained (pve) which describes how cohesive modules overall are.

As a test, I can show that my function is working:

# Some test parameter values. 
hyperparameters <- list(
  minModuleSize = 34,
  deepSplit = 4,
  mergeCutHeight = 0.18069189726375,
  reassignThresh = 0.0173420400521718,
  minKMEtoStay = 0.40141052021645,
  minCoreKMESize = 4)

test <- score_network(data, hyperparameters)

Optimization

Now I will use your function to try and optimize my function's parameters.

library(ParBayesianOptimization)

# Parameters to be optimized:
hyperparameters = list(
  minModuleSize = c(3L, 50L),  
  deepSplit =  c(0L, 4L),
  mergeCutHeight = c(0.01, 0.2),
  reassignThresh = c(0.01, 0.1),
  minKMEtoStay = c(0.1,0.7), 
  minCoreKMESize = c(3L,15L)
)

# Run the algorithm. 
hpo <- BayesianOptimization(score_network, 
                     bounds = hyperparameters, 
                     saveIntermediate = NULL,
                     leftOff = NULL, 
                     parallel = FALSE, 
                     packages = "WGCNA", 
                     export = "data",
                     initialize = TRUE, 
                     initGrid = NULL, 
                     initPoints = 1, 
                     bulkNew = parallelThreads,
                     nIters = 100, 
                     kern = "Matern52", 
                     beta = 0, 
                     acq = "ucb",
                     stopImpatient = list(newAcq = "ucb", rounds = Inf), 
                     kappa = 2.576,
                     eps = 0, 
                     gsPoints = 100, 
                     convThresh = 1e+07,
                     minClusterUtility = NULL, 
                     noiseAdd = 0.25, 
                     verbose = 1)

Error

The error message doesn't help me much:

FUN failed to run with error list: <simpleError in (function (data, hyperparameters) { net <- blockwiseModules(datExpr = data, deepSplit = hyperparameters$deepSplit, minModuleSize = hyperparameters$minModuleSize, mergeCutHeight = hyperparameters$mergeCutHeight, reassignThresh = hyperparameters$reassignThresh, minCoreKMESize = hyperparameters$minCoreKMESize, minKMEtoStay = hyperparameters$minKMEtoStay, power = 5, TOMDenom = "mean", detectCutHeight = 0.995, corType = "bicor", networkType = "signed", pamStage = TRUE, pamRespectsDendro = TRUE, saveTOMs = FALSE, maxBlockSize = 12000, verbose = 0) pve <- propVarExplained(datExpr = data, colors = net$colors, MEs = net$MEs) return(list(Score = median(pve)))})(minModuleSize = 26, deepSplit = 3, mergeCutHeight = 0.169136461073067, reassignThresh = 0.0544606877933256, minKMEtoStay = 0.248776969546452, minCoreKMESize = 3): unused arguments (minModuleSize = 26, deepSplit = 3, mergeCutHeight = 0.169136461073067, reassignThresh = 0.0544606877933256, minKMEtoStay = 0.248776969546452, minCoreKMESize = 3)> Error in BayesianOptimization(score_network, bounds = hyperparameters, : Stopping process.

AnotherSamWilson commented 5 years ago

Hey twesleyb, sorry for the late reply! That error occurs when the scoring function encountered an error. In this case, it looks like the error thrown is very... unhelpful. However, looking at this code, I can tell you that the parameters you are optimizing over need to be passed as arguments to your scoring function. If the parameter is listed in the bounds, it needs to be a parameter in the scoring function. Also, data can be found by score_network automatically, there is no need to pass it directly to the function. Try something like this:

# WGCNA function to be optimized:
score_network<- function(deepSplit,minModuleSize,mergeCutHeight,reassignThresh,minCoreKMESize,minKMEtoStay){
   net <- blockwiseModules(datExpr = data,
                           # Hyperparameters:
                           deepSplit              = deepSplit,       
                           minModuleSize    = minModuleSize,   
                           mergeCutHeight  = mergeCutHeight,  
                           reassignThresh     = reassignThresh,
                           minCoreKMESize = minCoreKMESize, 
                           minKMEtoStay     = minKMEtoStay,
                           # Defaults:
                           power                       = 5,
                           TOMDenom              = "mean",
                           detectCutHeight       = 0.995,
                           corType                     = "bicor",
                           networkType             = "signed",
                           pamStage                  = TRUE,
                           pamRespectsDendro = TRUE,
                           saveTOMs                  = FALSE,
                           maxBlockSize             = 12000,
                           verbose                      = 0)
   # Calculate and return median module coherence. 
   pve <- propVarExplained(datExpr = data, colors = net$colors, MEs = net$MEs)
   return(list(Score = median(pve)))
}

# Some test parameter values. 
hyperparameters <- list(
   minModuleSize = 34,
   deepSplit = 4,
   mergeCutHeight = 0.18069189726375,
   reassignThresh = 0.0173420400521718,
   minKMEtoStay = 0.40141052021645,
   minCoreKMESize = 4)

test <- score_network(hyperparameters)
}

I cannot run this right now because I am at work. I will try when I get home.