dcgerard / updog

Flexible Genotyping of Polyploids using Next Generation Sequencing Data
https://dcgerard.github.io/updog/
24 stars 8 forks source link

Error in foreach loop of multidog - can't find "flexdog_full" function #15

Closed tpbilton closed 3 years ago

tpbilton commented 3 years ago

Hi David,

I have problems running the multidog function on my company's HPC. I get an error:

> seq_geno = updog::multidog(ref, depth, ploidy = 4, model = "norm", nc = 2)
    |                                   *.#,%
   |||                                 *******/
 |||||||    (**..#**.                  */   **/
|||||||||    */****************************/*%
   |||    &****..,*.************************/
   |||     (....,,,*,...****%********/(******
   |||                ,,****%////,,,,./.****/
   |||                  /**//         .*///....
   |||                  .*/*/%#         .,/   .,
   |||               , **/   #%         .*    ..
   |||                               ,,,*

Working on it...Error in { : task 1 failed - "could not find function "flexdog_full""

I have been able to fix it by changing the multidog function. Specifically, if I replace lines 258-259:

cl = parallel::makeCluster(nc)
doParallel::registerDoParallel(cl = cl)

with

doParallel::registerDoParallel(nc)

and replaces line 361:

parallel::stopCluster(cl)

with

doParallel::stopImplicitCluster()

then it seems to work for me.

Is it feasible to make this change? Not sure, though, if there are requirements on CRAN that means it needs to be the way it is currently programmed.

Thanks, Timothy

dcgerard commented 3 years ago

Thanks for figuring this out, Timothy!

After reading more on this and looking at the source code of the doParallel package, my understanding is that the way I currently have it uses the "snow-like" methods from the parallel package on both Windows and Linux, while your edit uses the "multicore-like" methods from the parallel package on Linux (but still uses snow-like methods on Windows). So it might be that the snow-like methods don't work on your HPC?

It would be ideal if the package was more flexible to allow the user to choose the evaluation strategy, such as by using the future package. I'll work on this during the week.

dcgerard commented 3 years ago

Hey @tpbilton,

The most recent commit on the master branch should now give you some flexibility to choose the evaluation strategy through the future package.

You can edit the evaluation strategy by using future::plan() prior to using multidog() and setting nc = NA. So, for example, the following will set up multiple R sessions on your local:

library(updog)
data("uitdewilligen")
future::plan(future::multisession, workers = 2)
mout <- multidog(refmat = t(uitdewilligen$refmat),
                 sizemat = t(uitdewilligen$sizemat),
                 ploidy = uitdewilligen$ploidy,
                 nc = NA)

## Reset future to current R process
future::plan(future::sequential)

If you are using a scheduler for your HPC, like Slurm, then the future.batchtools package provides plans for that.

I'll do some more tests this week to make sure there are no bugs, but it's looking pretty good so far. I'll keep you up to date on when the package is stable again, but if you end up trying this out, please do let me know how it goes!

tpbilton commented 3 years ago

Thanks @dcgerard, this seems to be working on our HPC now (both multicore and single core). For context, I have been using a slurm job scheduler with an interactive session.

Timothy