Problem with bin refinement module

johanneswerner commented 5 years ago

Running the bin refinement module creates an error in my analysis

Could someone have a look and give me an indication what might have gone wrong? Thank you!

$ metawrap bin_refinement -o bin_refinement -t 36 -A initial_binning/metabat2_bins/ -B initial_binning/maxbin2_bins/ -C initial_binning/concoct_bins/ -c 50 -x 10
(...)
########################################################################################################################
#####                                      RUNNING CHECKM ON ALL SETS OF BINS                                      #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----                                         Running CheckM on binsA bins                                         -----
------------------------------------------------------------------------------------------------------------------------

*******************************************************************************
 [CheckM - tree] Placing bins in reference genome tree.
*******************************************************************************

  Identifying marker genes in 169 bins with 36 threads:
    Finished processing 169 of 169 (100.00%) bins.
  Saving HMM info to file.

  Calculating genome statistics for 169 bins with 36 threads:
    Finished processing 169 of 169 (100.00%) bins.

  Extracting marker genes to align.
  Parsing HMM hits to marker genes:
    Finished parsing hits for 169 of 169 (100.00%) bins.
  Extracting 43 HMMs with 36 threads:
    Finished extracting 43 of 43 (100.00%) HMMs.
  Aligning 43 marker genes with 36 threads:
    Finished aligning 43 of 43 (100.00%) marker genes.

  Reading marker alignment files.
  Concatenating alignments.
  Placing 169 bins into the genome tree with pplacer (be patient).

  { Current stage: 0:45:33.882 || Total: 0:45:33.882 }

*******************************************************************************
 [CheckM - lineage_set] Inferring lineage-specific marker sets.
*******************************************************************************

  Reading HMM info from file.
  Parsing HMM hits to marker genes:
    Finished parsing hits for 169 of 169 (100.00%) bins.

  Determining marker sets for each genome bin.
    Finished processing 169 of 169 (100.00%) bins (current: bin.129).   

  Marker set written to: binsA.checkm/lineage.ms

  { Current stage: 0:00:23.466 || Total: 0:45:57.348 }

*******************************************************************************
 [CheckM - analyze] Identifying marker genes in bins.
*******************************************************************************

  Identifying marker genes in 169 bins with 36 threads:
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
  [Error] Make sure HMMER executables (e.g., hmmsearch, hmmfetch) are on your system path.
Process SyncManager-150:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/managers.py", line 558, in _run_server
    server.serve_forever()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/managers.py", line 184, in serve_forever
    t.start()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Aborted (core dumped)
(...)

Here is the complete log: log.txt

ursky commented 5 years ago

Is it possible that you are running out of memory? What is the total memory limit on your system? Each thread you use takes up more RAM - have you tried running with fewer threads or using the --quick option to reduce memory usage?

johanneswerner commented 5 years ago

I would assume that I don't have memory limitations - my server has 1.5 TB of memory. But I will rerun with fewer threads and check if the results look different and keep you updated. Thank you a lot.

ursky commented 5 years ago

Make sure you pass in the memory limit with the -m option.

johanneswerner commented 5 years ago

I tried again with

metawrap bin_refinement \
  -o bin_refinement \
  -t 8 \
  -A initial_binning/metabat2_bins/ \
  -B initial_binning/maxbin2_bins/ \
  -C initial_binning/concoct_bins/ \
  -c 50 -x 10 -m 1000

and this is my result:

*******************************************************************************                                      [49/9155]
 [CheckM - analyze] Identifying marker genes in bins.
*******************************************************************************

  Identifying marker genes in 169 bins with 8 threads:
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Fatal exception (source file esl_threads.c, line 129):
thread creation failed
Aborted (core dumped)
Aborted (core dumped)
Process SyncManager-38:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/managers.py", line 558, in _run_server
    server.serve_forever()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/managers.py", line 184, in serve_forever
    t.start()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread
Process Process-41:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 114, in run            [17/9155]
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line 136, in __processB
in
    queueOut.put((binId, hmmModelFile))
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/queues.py", line 107, in put
    self._start_thread()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/queues.py", line 195, in _start_thread
    self._thread.start()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread
Aborted (core dumped)
  [Error] Make sure prodigal is on your system path.
  [Error] Make sure prodigal is on your system path.
Aborted (core dumped)
Process Process-40:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line 136, in __processB
in
    queueOut.put((binId, hmmModelFile))
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/queues.py", line 107, in put
    self._start_thread()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/queues.py", line 195, in _start_thread
    self._thread.start()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread
  [Error] Make sure HMMER executables (e.g., hmmsearch, hmmfetch) are on your system path.
  [Error] Make sure HMMER executables (e.g., hmmsearch, hmmfetch) are on your system path.
  [Error] Make sure HMMER executables (e.g., hmmsearch, hmmfetch) are on your system path.
Process Process-47:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line 157, in __reportPr
ogress
    binIdToModels[binId] = models
  File "<string>", line 2, in __setitem__
  File "/home/ubuntu/miniconda3/envs/metawrap/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod
    kind, result = conn.recv()
EOFError

Do you have any more ideas? Thank you a lot for your help.

johanneswerner commented 5 years ago

It seems to be working with one thread (but still running), I will keep you updated, if the job finishes without errors.

ursky commented 5 years ago

Very strange... Never had any issues multi-threading checkm before. The errors suggest that perhaps some programs are not in your path. Its unlikely, but can you verify that prodigal, hmmsearch, and hmmfetch are in your PATH and coming from your metawrap conda environment?

Otherwise, can you try to downgrade to conda install checkm-genome=1.0.12? I recently upgraded to version 1.0.13, which may be giving you issues. Also, can you perhaps open an issue in https://github.com/Ecogenomics/CheckM? @dparks1134 very helpful and might help us get to the bottom of this.

johanneswerner commented 5 years ago

@ursky @dparks1134

I don't get it, this is a strange behaviour.

Very strange... Never had any issues multi-threading checkm before. The errors suggest that perhaps some programs are not in your path. Its unlikely, but can you verify that prodigal, hmmsearch, and hmmfetch are in your PATH and coming from your metawrap conda environment?

All binaries come from the conda environment, the error came from something else. With this first dataset, I could only run it on one thread. I wanted to confirm this behaviour on another dataset, but parallelisation works there fine (which checkm version 1.0.13).

I am closing this issue as long as it does not occur again (I cannot reproduce the previous error).

johanneswerner commented 5 years ago

I must reopen the issue, because the issue remains but only for the combined bins of ABC. I downgraded checkm to 1.0.12 but the issue still remains.

johanneswerner commented 5 years ago

I don't get it, I cannot reproduce the issue (see https://github.com/Ecogenomics/CheckM/issues/195) and now I can parallelize this step again. Unfortunately, I don't know what caused the problem and why it disappeared. Closing issue.

ursky commented 5 years ago

Lets keep this open for visibility. Let me know if you figure this out. Perhaps this has to do with thread availability?

johanneswerner commented 5 years ago

It was a good idea to keep it open as it appeared again - sometimes.

I had a tmux/byobu session open before, and without opening the session before, it seems to be fine. I don't know if there is any connection, though.

bxlab / metaWRAP

Problem with bin refinement module #157