ArtPoon / kamphir

Phylogenetic inference using a tree-shape kernel in an Approximate Bayesian Computation framework
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

using mclapply within rcolgem locking up Kamphir #29

Open ArtPoon opened 9 years ago

ArtPoon commented 9 years ago

This problem seems to be specific to OS-X. I haven't reproduced it on a Linux cluster. Monitoring PIDs, I can see that the Python threads are constant throughout the run, but new threads are constantly being spawned by rcolgem. Eventually this locks up the run.

ArtPoon commented 9 years ago

Maybe this has something to do with default limits on number of processes in OS-X? ulimit is set to unlimited, but launchctl limit has maxproc set to 1064. Might this be causing Kamphir to lock up?

ArtPoon commented 9 years ago

Still encountering problems in OS-X despite resolving the process ID usage issue.

rmcclosk commented 9 years ago

The command that's locking up the program is

python kamphir.py DiffRisk settings.rcolgem-DiffRisk1.json rcolgem_c1-2.0_n-300_rho-0.9.nwk diffrisk-c1-2.log -kdecay 0.3 -tol0 0.005 -mintol 0.0025 -ncores 4 -nthreads 4 -nreps 20 -treenum 0

rmcclosk commented 9 years ago

This crashes on my machine. I get this error.

Error in unserialize(node$con) : error reading from connection
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection

The second two lines repeat about 30 times, and then I get a segfault message from Python.

ArtPoon commented 9 years ago

Interesting, I haven't encountered this one. It looks like a 'dill' related issue.

On May 5, 2015, at 9:48 AM, Rosemary McCloskey notifications@github.com wrote:

This crashes on my machine. I get this error.

Error in unserialize(node$con) : error reading from connection Error in serialize(data, node$con, xdr = FALSE) : error writing to connection

The second two lines repeat about 30 times, and then I get a segfault message from Python.

� Reply to this email directly or view it on GitHub.

ArtPoon commented 9 years ago

Maybe it has something to do with user permissions on loading R libraries?

http://stackoverflow.com/questions/24583793/error-reading-from-connection-on-loading-package-on-unix

On May 5, 2015, at 9:50 AM, Art Poon artpoon@gmail.com wrote:

Interesting, I haven't encountered this one. It looks like a 'dill' related issue.

On May 5, 2015, at 9:48 AM, Rosemary McCloskey notifications@github.com wrote:

This crashes on my machine. I get this error.

Error in unserialize(node$con) : error reading from connection Error in serialize(data, node$con, xdr = FALSE) : error writing to connection

The second two lines repeat about 30 times, and then I get a segfault message from Python.

� Reply to this email directly or view it on GitHub.

rmcclosk commented 9 years ago

Reinstalling dill fixed the above. Unable to reproduce the hang on my workstation so far.

ArtPoon commented 9 years ago

Did you install a different version of dill? If so, please record the version number that caused this issue. If not, this is very odd.

rmcclosk commented 9 years ago

Um, I think it was a shared library problem due to different versions of python (possibly also of R) installed on my machine. I installed dill with pip and reinstalled R from source using the --enable-R-shlib option. I'm not sure which of those two things fixed the issue, but at any rate I have written them both down in the install documentation.

ArtPoon commented 9 years ago

Well, you need to compile R with --enable-R-shlib in order to get the rpy2 module to work. Sorry that's not documented :-(

rmcclosk commented 9 years ago

It is now, or anyway it will be once I make a pull request.