error "database disk image is malformed" #317

Closed ccshao closed 5 years ago

ccshao commented 5 years ago

In my codes I accessed the the MotifDb via MotifDb::query(MotifDb::MotifDb, c("sox2", "Hsapiens")) However, when I put it in a parallel running environment, it throwed error

future::plan(future::multiprocess, workers = ncore)
future.apply::future_mapply(fn_tfsearch, inds1[, rn], inds1[, yn], MoreArgs= list(species, ...))
See system.file("LICENSE", package="MotifDb") for use restrictions.
Error in result_create(conn@ptr, statement) : 
  database disk image is malformed


future::plan(future::multisession, workers = ncore)
future.apply::future_mapply(fn_tfsearch, inds1[, rn], inds1[, yn], MoreArgs= list(species, ...))
See system.file("LICENSE", package="MotifDb") for use restrictions.
Error in result_create(conn@ptr, statement) : 
  external pointer is not valid

foreach didn't work either

foreach (i = seq_len(nrow(inds1))) %dopar% fn_tfsearch(inds1[i, rn], inds1[i, yn], species, ...)
Error in fn_tfsearch(inds1[i, rn], inds1[i, yn], species, ...) : 
  task 1 failed - "database disk image is malformed"

What is the proper way of accessing the database parallelly? Thanks!

HenrikBengtsson commented 5 years ago

Can you provide a small cut'n'pasteable example? Then I can give you a more specific answer.

HenrikBengtsson commented 5 years ago

Please make it minimal

ccshao commented 5 years ago

Some codes to reproduce the similar error. future_sapply works, but not future_mapply.

- install the package


- future_mapply

genes <- rep("sox2", 100)
fn1 <- function(in1, in2) {
  cc1 <- MotifDb::query(MotifDb::MotifDb, c(in1, "Hsapiens"))
  cc2 <- MotifDb::query(MotifDb::MotifDb, c(in2, "Hsapiens"))
  return(list(cc1, cc2))
future.apply::future_mapply(fn1, genes, genes, SIMPLIFY = FALSE)

Now the error are

Error: package or namespace load failed for ‘MotifDb’: .onLoad failed in loadNamespace() for 'MotifDb', details: call: validObject(.Object) error: invalid class “MotifList” object: 1: 'x@listData' is not parallel to 'x' invalid class “MotifList” object: 2: 'mcols(x)' is not parallel to 'x' Error in .requirePackage(package) : unable to find required package ‘MotifDb’ Loading required package: MotifDb Error: package or namespace load failed for ‘MotifDb’: .onLoad failed in loadNamespace() for 'MotifDb', details: call: validObject(.Object) error: invalid class “MotifList” object: 1: 'x@listData' is not parallel to 'x' invalid class “MotifList” object: 2: 'mcols(x)' is not parallel to 'x'

- Strangely the future_sapply work, in a fresh R session.

genes <- rep("sox2", 100)
future::plan(future::multiprocess, workers = 12)
future.apply::future_sapply(genes, function(x) cc1 <- MotifDb::query(MotifDb::MotifDb, c(x, "Hsapiens")))
HenrikBengtsson commented 5 years ago

Everything works fine for me on R 3.6.0 on Linux. There's nothing that makes me believe it shouldn't work the same on Windows or macOS. Two comments:

  1. Your original error message is completely different and independent from the latter.
  2. Your second error message suggests that you have some, unusual, setup in R that causes the background workers to use a different .libPaths() than what's in your main R session. I'd check .Renviron, .Rprofile, ...
## "BiocManager::install("MotifDb")

fn1 <- function(in1) {
  MotifDb::query(MotifDb::MotifDb, c(in1, "Hsapiens"))

fn2 <- function(in1, in2) {
  cc1 <- MotifDb::query(MotifDb::MotifDb, c(in1, "Hsapiens"))
  cc2 <- MotifDb::query(MotifDb::MotifDb, c(in2, "Hsapiens"))
  list(cc1, cc2)

genes <- rep("sox2", times = 3L)

y1_truth <- sapply(genes, fn1)
y2_truth <- mapply(fn2, genes, genes, SIMPLIFY = FALSE)

y1 <- future.apply::future_sapply(genes, fn1)
y2 <- future.apply::future_mapply(fn2, genes, genes, SIMPLIFY = FALSE)
stopifnot(identical(y1, y1_truth), identical(y2, y2_truth))

plan(multisession, workers = 2L)
y1 <- future.apply::future_sapply(genes, fn1)
y2 <- future.apply::future_mapply(fn2, genes, genes, SIMPLIFY = FALSE)
stopifnot(identical(y1, y1_truth), identical(y2, y2_truth))

plan(multicore, workers = 2L)
y1 <- future.apply::future_sapply(genes, fn1)
y2 <- future.apply::future_mapply(fn2, genes, genes, SIMPLIFY = FALSE)
stopifnot(identical(y1, y1_truth), identical(y2, y2_truth))
ccshao commented 5 years ago

Indeed I could run the above codes without problems. Sorry maybe it is some messing settings in R in my side.

ccshao commented 5 years ago

The error is due to multiple access to SQLite objects, involving AnnotationDbi, TxDb database.

FabianDK commented 3 years ago

The error is due to multiple access to SQLite objects, involving AnnotationDbi, TxDb database.

Can you please describe how to solve this issue?