HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
957 stars 83 forks source link

error "database disk image is malformed" #317

Closed ccshao closed 5 years ago

ccshao commented 5 years ago

In my codes I accessed the the MotifDb via MotifDb::query(MotifDb::MotifDb, c("sox2", "Hsapiens")) However, when I put it in a parallel running environment, it throwed error


future::plan(future::multiprocess, workers = ncore)
future.apply::future_mapply(fn_tfsearch, inds1[, rn], inds1[, yn], MoreArgs= list(species, ...))
See system.file("LICENSE", package="MotifDb") for use restrictions.
Error in result_create(conn@ptr, statement) : 
  database disk image is malformed

or

future::plan(future::multisession, workers = ncore)
future.apply::future_mapply(fn_tfsearch, inds1[, rn], inds1[, yn], MoreArgs= list(species, ...))
See system.file("LICENSE", package="MotifDb") for use restrictions.
Error in result_create(conn@ptr, statement) : 
  external pointer is not valid

foreach didn't work either

foreach (i = seq_len(nrow(inds1))) %dopar% fn_tfsearch(inds1[i, rn], inds1[i, yn], species, ...)
Error in fn_tfsearch(inds1[i, rn], inds1[i, yn], species, ...) : 
  task 1 failed - "database disk image is malformed"

What is the proper way of accessing the database parallelly? Thanks!

HenrikBengtsson commented 5 years ago

Can you provide a small cut'n'pasteable example? Then I can give you a more specific answer.

HenrikBengtsson commented 5 years ago

Please make it minimal

ccshao commented 5 years ago

Some codes to reproduce the similar error. future_sapply works, but not future_mapply.

- install the package

BiocManager::install("MotifDb")

- future_mapply

genes <- rep("sox2", 100)
fn1 <- function(in1, in2) {
  cc1 <- MotifDb::query(MotifDb::MotifDb, c(in1, "Hsapiens"))
  cc2 <- MotifDb::query(MotifDb::MotifDb, c(in2, "Hsapiens"))
  return(list(cc1, cc2))
}
future.apply::future_mapply(fn1, genes, genes, SIMPLIFY = FALSE)

Now the error are

Error: package or namespace load failed for ‘MotifDb’: .onLoad failed in loadNamespace() for 'MotifDb', details: call: validObject(.Object) error: invalid class “MotifList” object: 1: 'x@listData' is not parallel to 'x' invalid class “MotifList” object: 2: 'mcols(x)' is not parallel to 'x' Error in .requirePackage(package) : unable to find required package ‘MotifDb’ Loading required package: MotifDb Error: package or namespace load failed for ‘MotifDb’: .onLoad failed in loadNamespace() for 'MotifDb', details: call: validObject(.Object) error: invalid class “MotifList” object: 1: 'x@listData' is not parallel to 'x' invalid class “MotifList” object: 2: 'mcols(x)' is not parallel to 'x'

- Strangely the future_sapply work, in a fresh R session.

genes <- rep("sox2", 100)
future::plan(future::multiprocess, workers = 12)
future.apply::future_sapply(genes, function(x) cc1 <- MotifDb::query(MotifDb::MotifDb, c(x, "Hsapiens")))
HenrikBengtsson commented 5 years ago

Everything works fine for me on R 3.6.0 on Linux. There's nothing that makes me believe it shouldn't work the same on Windows or macOS. Two comments:

  1. Your original error message is completely different and independent from the latter.
  2. Your second error message suggests that you have some, unusual, setup in R that causes the background workers to use a different .libPaths() than what's in your main R session. I'd check .Renviron, .Rprofile, ...
library(future)
## "BiocManager::install("MotifDb")

fn1 <- function(in1) {
  MotifDb::query(MotifDb::MotifDb, c(in1, "Hsapiens"))
}

fn2 <- function(in1, in2) {
  cc1 <- MotifDb::query(MotifDb::MotifDb, c(in1, "Hsapiens"))
  cc2 <- MotifDb::query(MotifDb::MotifDb, c(in2, "Hsapiens"))
  list(cc1, cc2)
}

genes <- rep("sox2", times = 3L)

y1_truth <- sapply(genes, fn1)
y2_truth <- mapply(fn2, genes, genes, SIMPLIFY = FALSE)

plan(sequential)
y1 <- future.apply::future_sapply(genes, fn1)
y2 <- future.apply::future_mapply(fn2, genes, genes, SIMPLIFY = FALSE)
stopifnot(identical(y1, y1_truth), identical(y2, y2_truth))

plan(multisession, workers = 2L)
y1 <- future.apply::future_sapply(genes, fn1)
y2 <- future.apply::future_mapply(fn2, genes, genes, SIMPLIFY = FALSE)
stopifnot(identical(y1, y1_truth), identical(y2, y2_truth))

plan(multicore, workers = 2L)
y1 <- future.apply::future_sapply(genes, fn1)
y2 <- future.apply::future_mapply(fn2, genes, genes, SIMPLIFY = FALSE)
stopifnot(identical(y1, y1_truth), identical(y2, y2_truth))
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.13.0

loaded via a namespace (and not attached):
 [1] MotifDb_1.26.0              XVector_0.24.0             
 [3] GenomicRanges_1.36.0        BiocGenerics_0.30.0        
 [5] zlibbioc_1.30.0             GenomicAlignments_1.20.0   
 [7] IRanges_2.18.1              BiocParallel_1.18.0        
 [9] lattice_0.20-38             GenomeInfoDb_1.20.0        
[11] globals_0.12.4              tools_3.6.0                
[13] grid_3.6.0                  SummarizedExperiment_1.14.0
[15] parallel_3.6.0              data.table_1.12.2          
[17] Biobase_2.44.0              matrixStats_0.54.0         
[19] digest_0.6.19               Matrix_1.2-17              
[21] GenomeInfoDbData_1.2.1      rtracklayer_1.44.0         
[23] S4Vectors_0.22.0            bitops_1.0-6               
[25] codetools_0.2-16            RCurl_1.95-4.12            
[27] future.apply_1.2.0-9000     DelayedArray_0.10.0        
[29] compiler_3.6.0              Biostrings_2.52.0          
[31] Rsamtools_2.0.0             stats4_3.6.0               
[33] XML_3.98-1.20               splitstackshape_1.4.8      
[35] listenv_0.7.0      
ccshao commented 5 years ago

Indeed I could run the above codes without problems. Sorry maybe it is some messing settings in R in my side.

ccshao commented 5 years ago

The error is due to multiple access to SQLite objects, involving AnnotationDbi, TxDb database.

FabianDK commented 3 years ago

The error is due to multiple access to SQLite objects, involving AnnotationDbi, TxDb database.

Can you please describe how to solve this issue?