caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
61 stars 45 forks source link

Sourcetracker2 runs for 1m30s then all processes switch to 'Sleep' and hangs #106

Closed jfy133 closed 5 years ago

jfy133 commented 6 years ago

Hello,

Issue I just tried running Sourcetracker2, as installed via instructions on the README (anaconda then pip install).

When I run the program with the --jobs parameter, and monitor via htop, the program runs for 1m30s and then stops, essentially hanging. A bunch of sourcetracker processes remain sitting with the 'sleep' status, and sourcetracker2 runs indefinitely (i.e. never completes - I left it for 24 hours in one case).

Description To describe what I did:

1) Get the github repo to use the tiny-test data git clone https://github.com/biota/sourcetracker2.git 1) Installed Anaconda3.-5.1.0.Linux-x86_64.sh 2) Made the st2 env conda create -n st2 -c biocore python=3.5 numpy scipy scikit-bio biom-format h5py hdf5 seaborn 3) Activated the env source activate st2 4) Installed sourcetracker pip install sourcetracker 5) Changed to the tiny-test directory 6) Run command with no --jobs (this completes successfully) sourcetracker2 gibbs -i otu_table.biom -m map.txt -o example1/ 7) Run command with --jobs 5, which hangs sourcetracker2 gibbs -i otu_table.biom -m map.txt -o example1/ --jobs 5

I attach two screenshots htop during the --jobs 5 run, both pre- and post 1 minute 30 seconds. Note the pre-1:30 has many more st2 related processes but I've cropped for space reasons.

01-st2_multijobs_running_pre1m30

02-st2_multijobs_hanging_post1m30

I have noted that individual files are generated for each sample in the results directory (s0.txt, s1.txt, s2.txt etc.). I also tried with and without modifying the --cluster-delay-start parameter with no differences.

As a sidenote, it would be nice of some stdout and stderr messages could be added, as it would help me investigate further and be more precise in my error reporting.

Environment Information The sourcetracker2 environment information:

Anaconda conda environments:

base * /projects1/users/fellows/bin.backup/anaconda3 st2 /projects1/users/fellows/bin.backup/anaconda3/envs/st2

Sourcetracker2 st2_env_info.txt

wdwvt1 commented 6 years ago

Hi @jfy133 - thanks for the detailed post. I will take a look at this later tonight.

johnchase commented 5 years ago

I can confirm this issue using OSX. It is related to ipyparallel not shutting down the cluster properly on this line: https://github.com/biota/sourcetracker2/blob/b81220f26ca566cd1f131cc7f17c826320a96076/sourcetracker/_cli/gibbs.py#L238

As discussed in #92 removing ipyparallel as a dependency will fix this issue

jfy133 commented 5 years ago

:tada: glad to see this hopefully fixed. Do you have an approximate ETA when the bioconda version will be updated?

Also, I see that the conda install instructions refers to -c biocore but as far as I can see it is under the bioconda repository. Which is the correct method to use?