Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

BiocParallel for parallelization in BEER: Error and GPU Compatibility #256

Closed FarzanehRah closed 1 year ago

FarzanehRah commented 1 year ago

I'm using the BEER package for phip-seq analysis, which relies on BiocParallel for parallelization. However, I encountered an error related to parallelization. When running the code beer_out <- brew(edgeR_out, assay.names = assay_locations, BPPARAM = BiocParallel::SerialParam(stop.on.error=FALSE)), I received the following error message:

── Running JAGS
Error: BiocParallel errors
  2 remote errors, element index: 1, 2
  0 unevaluated and other errors
  first remote error:
Error in FUN(X[[i]], ...) : negative length vectors are not allowed
Execution halted

I'm interested in exploring the compatibility of BiocParallel with GPU processing and whether utilizing GPU could help resolve this issue. Can you provide any insights or guidance on addressing this problem effectively?

Thank you.

mtmorgan commented 1 year ago

Are you sure the problem is in BiocParallel, or is it that one or more of your 300,000 polypeptides is unusual in some way that causes brew() to produce an error? The error negative length vectors are not allowed might occur if there is an integer overflow in C code, but BiocParallel is not likely to be the source of this type of error.

For me to be more helpful you must provide a reproducible example -- maybe some artificial data that I can copy and paste into my own session, along with a few lines of code that allow me to reproduce the problem. Here is what I get when I try to reproduce your code

> beer_out <- brew(edgeR_out, assay.names = assay_locations, BPPARAM = BiocParallel::SerialParam(stop.on.error=FALSE))
Error in brew(edgeR_out, assay.names = assay_locations, BPPARAM = BiocParallel::SerialParam(stop.on.error = FALSE)) :
  could not find function "brew"

GPU support would likely require extensive development in the package where brew() is implemented, it is not something that BiocParallel can provide.

FarzanehRah commented 1 year ago

Thank you for your prompt response. I appreciate your valuable insights regarding the issue I encountered with the brew() function. I must admit that I'm uncertain whether the problem lies with some of the unusual polypeptides or if there are other factors at play. We generated all the input data in a consistent manner, but it's possible that some of them may have unexpected properties!!

I have attached an R object (edgeR_out.rds) containing 2254 artificial peptides that can be used as input for the beer() function. Additionally, I have included a portion of the script that demonstrates how the function is being executed. With this smaller dataset, the code runs without any errors. However, when attempting to use the entire dataset, the error occurs.

I greatly appreciate your time and expertise in helping to address this issue. Your assistance is invaluable to me.

Best, Farzaneh

mtmorgan commented 1 year ago

unfortunately one requires code that fails to debug it. Also code that takes a long time to run or requires a large amount of memory is not really helpful; it is better to narrow down the example as much as possible, e.g., is it the size of the data, or just specific elements that can be selected and easily run?

FarzanehRah commented 1 year ago

Hi, thank you again for your insights. I have made some modifications based on your suggestions. Now, I have reduced the input dataset to only 50 peptides, and I would like to run the brew() function on this smaller dataset:

 library(beer)

# load edgeR_out object
edgeR_obj_file = "edgeR_obj_50.rds"
edgeR_obj_50 = readRDS(file = edgeR_obj_file)

# run BEER
beer_out = brew(edgeR_obj_50, assay.names = c(phi = "beer_fc_marg", phi_Z = "beer_fc_cond", 
                 Z = "beer_prob",  c = "sampleInfo", pi = "sampleInfo"),
       BPPARAM = BiocParallel::SerialParam(stop.on.error=FALSE)) 

edgeR_obj_50.zip As I mentioned before, I have encountered issues when working with a large dataset. However, I have tested the function on this smaller dataset and it seems to be working fine. I appreciate your help and guidance throughout this process. Thank you once again.

FarzanehRah commented 1 year ago

In order to investigate the source of the issue, I conducted an experiment. I ran the code beer_out = brew(edgeR_obj_2254, assay.names = c(phi = "beer_fc_marg", phi_Z = "beer_fc_cond", Z = "beer_prob", c = "sampleInfo", pi = "sampleInfo"), BPPARAM = BiocParallel::SerialParam(stop.on.error=FALSE) with an input dataset containing 2254 peptides, and it worked without any problems.

I replicated the same dataset 100 times, resulting in a dataset with 100 * 2254 peptides. However, this time I encountered the following error:

── Running JAGS ────────────────────────────────────────────────────────────────
Sample runs
Error: BiocParallel errors
  2 remote errors, element index: 1, 2
  0 unevaluated and other errors
  first remote error:
Error in FUN(X[[i]], ...) : negative length vectors are not allowed
Execution halted

Based on this experiment, it appears that the error arises when the dataset size increases significantly. This suggests that the issue may be related to the size of the dataset rather than the specific peptides within it.

Thank you for your attention and assistance.

mtmorgan commented 1 year ago

I have not had the patience to run this to the point of an error, and the steps for reproduction are not exactly clear (you say "I replicated the same dataset 100 times" but don't tell me how to do this; I did big <- rep(edgeR_out, 100), is that correct?).

Please tell me what version of R R.version.string and BiocParallel packageVersion("BiocManager") you are using.

I suggest (a) do NOT use stop.on.error = FALSE (b) use bptry() and (c) report the 'traceback' of an error. As a simplified example, I did

> library(BiocParallel)
> f = function(i) stop(i)
> g = function(i) f(i)
> result <- bptry(bplapply(1:2, g, BPPARAM = SerialParam()))
> result
[[1]]
<remote_error in f(i): 1>
traceback() available as 'attr(x, "traceback")'

[[2]]
<unevaluated_error: not evaluated due to previous error>

attr(,"REDOENV")
<environment: 0x152e7e408>
> cat(attr(result[[1]], "traceback"), sep = "\n")
6: handle_error(e)
5: h(simpleError(msg, call))
4: .handleSimpleError(function (e)
   {
       annotated_condition <- handle_error(e)
       stop(annotated_condition)
   }, "1", base::quote(f(i)))
3: stop(i) at #1
2: f(i) at #1
1: FUN(...)

I expect that you will be able to

beer_out <- bptry(
    brew(big, assay.names = assay_locations, BPPARAM = BiocParallel::SerialParam())
)
cat(attr(beer_out[[1]], "traceback"), sep = "\n")

Additional details are in the vignette Errors, Logs, and Debugging in BiocParallel

mtmorgan commented 1 year ago

This eventually completed for me. I have traceback

> cat(attr(beer_out[[1]], "traceback"), sep = "\n")
12: handle_error(e)
11: h(simpleError(msg, call))
10: .handleSimpleError(function (e) 
    {
        annotated_condition <- handle_error(e)
        stop(annotated_condition)
    }, "negative length vectors are not allowed", base::quote(FUN(X[[i]], 
        ...)))
9: FUN(X[[i]], ...)
8: lapply(usingtypes, function(t) {
       ans <- .Call("get_monitored_values", model$ptr(), t, PACKAGE = "rjags")
       for (i in seq(along = ans)) {
           class(ans[[i]]) <- "mcarray"
           attr(ans[[i]], "varname") <- names(ans)[i]
           if (is.null(dim(ans[[i]]))) {
               dim(ans[[i]]) <- length(ans[[i]])
           }
           attr(ans[[i]], "type") <- t
           attr(ans[[i]], "iterations") <- c(start = startiter + 
               thin, end = startiter + n.iter, thin = thin)
       }
       pn <- parse.varnames(variable.names[type == t])
       for (i in seq(along = variable.names[type == t])) {
           if (status[[t]][i]) {
               .Call("clear_monitor", model$ptr(), pn$names[i], 
                   pn$lower[[i]], pn$upper[[i]], t, PACKAGE = "rjags")
           }
       }
       return(ans)
    ...
7: jags.samples(model, variable.names, n.iter, thin, type = "trace", 
       ...)
6: coda.samples(jags_model, variable.names = c("c", "pi", "Z", "phi"), 
       n.iter = n.iter, thin = thin, na.rm = na.rm, ...)
5: withVisible(...elt(i))
4: capture.output({
       jags_model <- rjags::jags.model(file = model_file, data = data_list, 
           inits = inits_list, n.chains = n.chains, n.adapt = n.adapt)
       mcmc <- coda.samples(jags_model, variable.names = c("c", 
           "pi", "Z", "phi"), n.iter = n.iter, thin = thin, na.rm = na.rm, 
           ...)
   })
3: (function (object, sample, prior.params, n.chains = 1, n.adapt = 1000, 
       n.iter = 10000, thin = 1, na.rm = TRUE, ..., seed = as.numeric(format(Sys.Date(), 
           "%Y%m%d"))) 
   {
       data_list <- list(N = 1, P = nrow(object), B = 0, n = librarySize(object[, 
           sample], withDimnames = FALSE), Y = unname(counts(object)[, 
           sample, drop = FALSE]))
       data_list <- c(data_list, prior.params)
       inits_list <- guessInits(object[, sample], list(a_0 = prior.params[["a_0"]], 
           b_0 = prior.params[["b_0"]]))
       inits_list$.RNG.name <- "base::Wichmann-Hill"
       inits_list$.RNG.seed <- seed
       model_file <- system.file("extdata/phipseq_model.bugs", package = "beer")
       capture.output({
           jags_model <- rjags::jags.model(file = model_file, data = data_list, 
               inits = inits_list, n.chains = n.chains, n.adapt = n.adapt)
           mcmc <- coda.samples(jags_model, variable.names = c("c", 
               "pi", "Z", "phi"), n.iter = n.iter, thin = thin, 
               na.rm = na.rm, ...)
       })
    ...
2: do.call(brewOne, c(list(object = one_sample, sample = sample, 
       prior.params = new_prior, quiet = TRUE), jags.params))
1: FUN(...)

Levels 8 and 9

9: FUN(X[[i]], ...)
8: lapply(usingtypes, function(t) {
       ans <- .Call("get_monitored_values", model$ptr(), t, PACKAGE = "rjags")
       ...

are the calls leading to the error, and it suggests that the problem is with the data being 'too large' for the rjags package. So

FarzanehRah commented 1 year ago

Thank you so much for your time and effort. I apologize for not responding to your last message. I ran the code on HPC and requested a large memory allocation for the analysis. Unfortunately, my job is still in the queue and hasn't finished yet, so I was unable to provide the information. I truly appreciate your help and insight. Thank you once again for your help.