datacloning / dclone

Data Cloning and MCMC Tools for Maximum Likelihood Methods
https://github.com/datacloning/dclone
7 stars 2 forks source link

Error: Port cannot be open #6

Closed DiogoFerrari closed 7 years ago

DiogoFerrari commented 7 years ago

I am running different R scripts in batch mode at once in a linux cluster to estimate a model in different data sets (it also happens when I run it in Mac). The scripts are exactly the same, except for the data set that they are using. I get the following message when I do that.

Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : cannot open the connection Calls: makePSOCKcluster -> newPSOCKnode -> socketConnection In addition: Warning message: In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : port 11426 cannot be opened

Here is a reproducible example. Create two files, tmp1.R and tmp2.R, and tmp.sh with the content:

The first file in the list will be executed. The second will present the error above. Does anyone know how to solve it and still run all the scripts at once automatically without any manual intervention?

psolymos commented 7 years ago

Have you tried quitting your R session? Close the file with q('no')

DiogoFerrari commented 7 years ago

I am not sure if I understood the question, but the two scripts run at the same time in batch mode. They both start before the other is finished, that is, before any statement as "q('n')".

psolymos commented 7 years ago

I see. Now it makes sense. This way you start 2 master processes that spawn the child processes, and the 2nd master's children are in conflict with the 1st ones.

I also don't think it is a dclone problem, the same should happen if you replace the 1st line with library(parallel). I tried both dclone and parallel multiple times, and sometimes both exited fine (without error), sometimes tmp2.R finished but tmp1.R gave error, etc. So I really do think it is the double master process causing trouble.

DiogoFerrari commented 7 years ago

That was my guess too, although I thought it could be resolved via manipulation of parameters of functions of the package, or using some workaround. I have seen many posts about the problem, but no clear solution yet.