bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
393 stars 191 forks source link

Bin_refinement stops if one of the folders is empty #64

Open alexmsalmeida opened 5 years ago

alexmsalmeida commented 5 years ago

Hi,

Been running metaWRAP on a few sets of metagenomes and when I reach the bin_refinement step the program stops if one of the bins folders is empty.

Here is the log:

########################################################################################################################
#####                                                BEGIN PIPELINE!                                               #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----                              setting up output folder and copything over bins...                             -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                           there are 2 bins in binsA                                          -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                           there are 5 bins in binsB                                          -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                           there are 2 bins in binsC                                          -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                             There are 3 bin sets!                                            -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                              Fix contig naming by removing special characters...                             -----
------------------------------------------------------------------------------------------------------------------------

ERR209114/bin_refinement/binsA/bin.1.fa
ERR209114/bin_refinement/binsA/bin.unbinned.fa
ERR209114/bin_refinement/binsB/bin.1.fa
ERR209114/bin_refinement/binsB/bin.2.fa
ERR209114/bin_refinement/binsB/bin.3.fa
ERR209114/bin_refinement/binsB/bin.4.fa
ERR209114/bin_refinement/binsB/bin.unbinned.fa
ERR209114/bin_refinement/binsC/bin.0.fa
ERR209114/bin_refinement/binsC/bin.1.fa

########################################################################################################################
#####                                             BEGIN BIN REFINEMENT                                             #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----              There are three bin folders, so there 4 ways we can refine the bins (A+B, B+C, A+C,             -----
-----                                    A+B+C). Will try all four in parallel!                                    -----
------------------------------------------------------------------------------------------------------------------------

Specified 2 input bin sets: -1 binsC -2 binsB
Specified 2 input bin sets: -1 binsA -2 binsB
Specified 2 input bin sets: -1 binsA -2 binsC
Specified 3 input bin sets: -1 binsA -2 binsB -3 binsC
Add folder/bin name to contig name for binsA bins
Add folder/bin name to contig name for binsA bins
Add folder/bin name to contig name for binsC binsAdd folder/bin name to contig name for binsA bins

Add folder/bin name to contig name for binsB bins
Add folder/bin name to contig name for binsB bins
Add folder/bin name to contig name for binsB bins
Add folder/bin name to contig name for binsC bins
Combine all bins together
Add folder/bin name to contig name for binsC bins
Combine all bins together
Combine all bins together
The number of refined bins: 3
Combine all bins together
The number of refined bins: 1
The number of refined bins: 0
Exporting refined bins...
Extracting refined bin: Refined_3.fastaExporting refined bins...
Extracting refined bin: Refined_1.fastaExporting refined bins...
The number of refined bins: 0

Deleting temporary files

Deleting temporary files
Exporting refined bins...

Deleting temporary files

All done!

Deleting temporary files

All done!

All done!

All done!

------------------------------------------------------------------------------------------------------------------------
-----                                      there are 0 refined bins in binsAB                                      -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                      there are 1 refined bins in binsBC                                      -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                      there are 3 refined bins in binsAC                                      -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                      there are 0 refined bins in binsABC                                     -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                     Bin refinement finished successfully!                                    -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                            fixing bin naming to .fa convention for consistancy...                            -----
------------------------------------------------------------------------------------------------------------------------

########################################################################################################################
#####                                      RUNNING CHECKM ON ALL SETS OF BINS                                      #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----                                         Running CheckM on binsA bins                                         -----
------------------------------------------------------------------------------------------------------------------------

*******************************************************************************
 [CheckM - tree] Placing bins in reference genome tree.
*******************************************************************************

----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id            Marker lineage      # genomes   # markers   # marker sets   0    1    2   3   4   5+   Completeness   Contamination   Strain heterogeneity  
----------------------------------------------------------------------------------------------------------------------------------------------------------------
  bin.unbinned   k__Bacteria (UID203)      5449        104            58        79   17   8   0   0   0       30.96           10.66              87.50          
  bin.1              root (UID1)           5656         56            24        56   0    0   0   0   0        0.00            0.00               0.00          
----------------------------------------------------------------------------------------------------------------------------------------------------------------

  { Current stage: 0:00:00.241 || Total: 0:00:58.660 }

------------------------------------------------------------------------------------------------------------------------
-----                There are 0 'good' bins found in binsA! (>50% completion and <5% contamination)               -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                         Running CheckM on binsAB bins                                        -----
------------------------------------------------------------------------------------------------------------------------

*******************************************************************************
 [CheckM - tree] Placing bins in reference genome tree.
*******************************************************************************

  [Error] No bins found. Check the extension (-x) used to identify bins.

  Controlled exit resulting from an unrecoverable error or warning.

************************************************************************************************************************
*****                             Something went wrong with running CheckM. Exiting...                             *****
************************************************************************************************************************

real    1m19.968s
user    1m9.192s
sys 0m12.071s

------------------------------------------------------------

Although binsAB did not have any refined bins I assume the program should still run CheckM on the rest of the folders? Any way to fix this?

Many thanks in advance, Alex

ursky commented 5 years ago

Sorry, metaWRAP isn't really equipped to deal with such a sparse data set. From the CheckM results from binsA it looks like bin.1.fa and unbinned.fa have a cumulative 30% completion... If these two bins really contain all your contigs from your assembly, then your assembly may not be worth binning at all. How big is your whole assembly? And how many reads are you working with? Something seems off.

alexmsalmeida commented 5 years ago

Yeah, the assembly is not really of good quality (685 contigs > 2kb), total of 3Mb. Will likely just discard this one, but was worried more about the general issue of stopping the program if one of the folders is empty. I imagine there could be situations where some bins in the other folders might still be salvageable, but maybe I am wrong.