HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

RSQLite database pointer is invalid in foreach loop with doFuture adapter #62

Closed koenniem closed 3 years ago

koenniem commented 3 years ago

When using a database pointer inside a foreach loop (using the doFuture backend), the database pointer becomes invalid:

library(future)
library(doFuture)
library(DBI)

registerDoFuture()
plan(multisession)

db <- dbConnect(RSQLite::SQLite(), "test.db")

res <- foreach(i = 1:10) %dopar% {
    dbIsValid(db)
}

plan(sequential)
registerDoSEQ()

res

Is this a limitation in doFuture or foreach? Is there a different kind of parallel looping function I can use with such a database pointer, e.g. future_apply? I suspect this is because plan(multisession) creates workers in separate R sessions, rendering the pointer invalid but I'm not sure what I can do to resolve this.

HenrikBengtsson commented 3 years ago

Hi. Yes, DBIConnection is among the class of objects that cannot be exported to another R process. You can read more about it, and see other examples in https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html.

Is this a limitation in doFuture or foreach? Is there a different kind of parallel looping function I can use with such a database pointer, e.g. future_apply

No, it applies to all types of parallel processing in R, not just the ones in the future framework. Unfortunately, there's no solution to this.

FWIW, note that foreach w/ doFuture, future.apply, and furrr are all map-reduce APIs that build on top of the future framework. So, using foreach() %dopar% { ... }, future_lapply(), or future_map() is just a matter of taste - from a parallelization point of view they're all the same. For example, see my https://www.jottr.org/2020/12/19/future-eurobioc2020-slides/ talk:

BengtssonH_20201218-futures-EuroBioc2020_s31

Hope this clarifies it.

koenniem commented 3 years ago

That's a great explanation, thanks! I didn't find that list yet (only the short explanation on non-exportable objects) butthat answers my question.

In case anyone else comes to this question from Google or the like, I worked around the problem by opening a new database connection inside of the foreach loop and then closing it at the end. It's a tad slower, but I guess the only thing that works.