bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
383 stars 188 forks source link

Problems with checkM #278

Open Lucas-Maciel opened 4 years ago

Lucas-Maciel commented 4 years ago

I'm trying to use checkM but I have problems both using during binning and bin_refinment. After the end of the following message the job dies.

------------------------------------------------------------------------------------------------------------------------
-----                               MaxBin2 finished successfully, and found 93 bins!                              -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                   Running CheckM on idba/maxbin2_bins bins                                   -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----            There is 160 RAM and 4 threads available, and each pplacer thread uses <40GB, so I will           -----
-----                                          use 4 threads for pplacer                                           -----
------------------------------------------------------------------------------------------------------------------------

*******************************************************************************
 [CheckM - tree] Placing bins in reference genome tree.
*******************************************************************************

  Identifying marker genes in 93 bins with 4 threads:
Finished processing 93 of 93 (100.00%) bins.
Traceback (most recent call last):
  File "/home/ABTLUS/lucas.maciel/miniconda3/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 277, in _run_finalizers
    finalizer()
  File "/home/ABTLUS/lucas.maciel/miniconda3/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 207, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/ABTLUS/lucas.maciel/miniconda3/envs/metawrap-env/lib/python2.7/shutil.py", line 266, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "/home/ABTLUS/lucas.maciel/miniconda3/envs/metawrap-env/lib/python2.7/shutil.py", line 264, in rmtree
    os.remove(fullname)
OSError: [Errno 16] Device or resource busy: 'idba/maxbin2_bins.tmp/pymp-1Kqtcy/.nfs00000000107c1b6a0048882f'
  Saving HMM info to file.

  Calculating genome statistics for 93 bins with 4 threads:
Finished processing 93 of 93 (100.00%) bins.

  Extracting marker genes to align.
  Parsing HMM hits to marker genes:
Finished parsing hits for 93 of 93 (100.00%) bins.

  Extracting 43 HMMs with 4 threads:
Finished extracting 43 of 43 (100.00%) HMMs.

  Aligning 43 marker genes with 4 threads:
Finished aligning 43 of 43 (100.00%) marker 
genes.

  Reading marker alignment files.
  Concatenating alignments.
  Placing 93 bins into the genome tree with pplacer (be patient).

The folder idba/maxbin2_bins.tmp/pymp-1Kqtcy/ that is in the error message has no file. I also tried to use the other binning tools available but it did not work.

The file pplacer.out looks like (not exactly the same ran as above)

 cat pplacer.out 
Running pplacer v1.1.alpha19-0-g807f6f3 analysis on idba/maxbin2_bins.checkm/storage/tree/concatenated.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
Pre-masking sequences... sequence length cut from 6988 to 6870.
Warning: pplacer results make the most sense when the given tree is multifurcating at the root. See manual for details.
Determining figs... figs disabled.
Allocating memory for internal nodes... done.
Caching likelihood information on reference tree... done.
Pulling exponents... done.
Preparing the edges for baseball... done.
working on bin.86 (1/79)...
working on bin.79 (2/79)...
working on bin.81 (3/79)...

Edit/Update:

When I use checkM in the metawrap-env it works but not in the pipeline:

$ checkm lineage_wf -x fa binsAB/ teste.checkm

  Reading marker alignment files.
  Concatenating alignments.
  Placing 36 bins into the genome tree with pplacer (be patient).

  { Current stage: 0:18:52.598 || Total: 0:18:52.598 }

*******************************************************************************
 [CheckM - lineage_set] Inferring lineage-specific marker sets.
*******************************************************************************
ursky commented 4 years ago

Sorry for the late reply. Does CheckM finish after that error? Sometimes i get strange warnings/errors and it still finishes fine. The reason it works with an external checkm run is because the command within metawrap uses an option for a custom tmp dir (i.e. sample_checkm.tmp), which is where the error is coming from. If its failing you could just go into the binning.sh and bin_refinement.sh and remove the tmp option (i believe its the --tmp option). Good luck!