Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

BiocParallel : long vectors are not supported in .C() #255

Closed YouriTasse closed 1 year ago

YouriTasse commented 1 year ago

When I run the gsva() function from the gsva package in R.4.2.2., i get this error :

Error : BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in FUN(...): long vectors (argument 1) are not supported in .C

It only happens with very large datasets.

What should I do ?

Jiefei-Wang commented 1 year ago

Without a runnable code, I cannot really tell where the actual issue is(please try to make one). However, what you can do is run your code with BPPARAM = SerialParam() first to see if you have the same error. Most likely, this is not a BiocParallel issue but the issue of the package you are using in your code.

YouriTasse commented 1 year ago

Here is the code

library(GSVA)
library(readxl)
library(dplyr)
library(tidyr)
library(xlsx)

# load the dataset 
load("./Wu_V2.RData")
data = mat

## geneset 
gene.excel = read_excel("./c5.xlsx") %>% as.data.frame()

  # set the proper format
  genes = gene.excel[,1] %>% na.omit() %>% as.data.frame()
  genes = list(as.character(genes[,1]))

  nbCores = parallel::detectCores()

### GSVA
gsva1 = gsva(data, #must be a matrix
             genes, #was markers.set
             verbose = TRUE,
             method = "gsva",
             parallel.sz =  nbCores-2) %>% t()

I ran this code on many smaller datasets before without any issue. If I were to run the code without parallelization, it would run for at least 20 days... So it is not really an option.

parallel.sz call for : multicoreParam()

I think that my issue is similar to this one : https://stackoverflow.com/questions/34165654/r-vector-size-limit-long-vectors-argument-5-are-not-supported-in-c

Jiefei-Wang commented 1 year ago

If you are hitting the hard limit of R vector size, the only way to fix it is to redesign the algorithm. I think you can report this to GSVA maintainers and let them know about this issue. @mtmorgan what do you think?

mtmorgan commented 1 year ago

Yes; the repository is at https://github.com/rcastelo/GSVA and I am sure @rcastelo will be able to help.

YouriTasse commented 1 year ago

Thanks you very much to both of you ! I just created a new issue on the GSVA repository.