Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
67 stars 29 forks source link

Error using BiocParallel on Travis #61

Open lgatto opened 7 years ago

lgatto commented 7 years ago

I recently experienced the following error running parallel code with BiocParallel on Travis:

Quitting from lines 189-194 (benchmarking.Rmd) 
Error: processing vignette 'benchmarking.Rmd' failed with diagnostics:
setting worker timeout:
  error reading from connection
Execution halted

The error was reported here happened when running MSnbase::quantify that uses BiocParallel. Registering a SerialParam instead of the default MulticoreParam fixes the issue and now builds fine on Travis. I raised the issue on Travis directly, and it was suggested that there might be something wrong in BiocParallel.

Any idea?

DarwinAwardWinner commented 7 years ago

I've been seeing weird issues with MulticoreParam hanging or throwing errors recently (on my laptop, not on Travis). I didn't have time to debug them, so I just switch to DoparParam (the one based on foreach backends) and that worked fine.

mtmorgan commented 7 years ago

The problems are likely either a port not being available (use options(ports=...) to specify an available port) or attempting to serialize very large objects.

lgatto commented 7 years ago

Thanks @mtmorgan. I am looking into the ports suggestion, as there's no (large) serialisation. What ports, if any, are used in MulticoreParams?

mtmorgan commented 7 years ago

It chooses a random port in 11000-12000. Also, options(bphost="localhost") overrides the default host choice Sys.info()[["nodename"]]

kjohnsen commented 6 years ago

I have had a similar problem: my unit tests suddenly fail when I run them on Travis:

test_distance.R:19: error: whole genome is read correctly
'bplapply' receive data failed:
  error reading from connection
1: calculateDistance(mmapprData) at /home/travis/build/kjohnsen/MMAPPR2/tests/testthat/test_distance.R:19
2: .runFunctionInParallel(chrList, .calcDistForChr, param = mmapprData@param) at /home/travis/build/kjohnsen/MMAPPR2/R/distance.R:32
3: BiocParallel::bplapply(inputList, functionToRun, ..., BPPARAM = bpParam) at /home/travis/build/kjohnsen/MMAPPR2/R/main.R:4
4: BiocParallel::bplapply(inputList, functionToRun, ..., BPPARAM = bpParam)
5: bploop(structure(list(), class = "lapply"), X, lapply, ARGFUN, BPPARAM)
6: bploop.lapply(structure(list(), class = "lapply"), X, lapply, ARGFUN, BPPARAM)
7: .recv1(cl, "bplapply")
8: tryCatch({
       parallel:::recvOneData(cluster)
   }, error = function(e) {
       stop(.error_worker_comm(e, sprintf("'%s' receive data failed", id)))
   })
9: tryCatchList(expr, classes, parentenv, handlers)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
11: value[[3L]](cond)

And my examples then fail as well:

Error: 'bplapply' receive data failed:
  error reading from connection

Strangely enough, it seems this problem went away when I set sudo: required in my .travis.yml file.