bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
389 stars 190 forks source link

empty bin set kills the program in refinement module #192

Open ganiatgithub opened 5 years ago

ganiatgithub commented 5 years ago

Hello,

I came across this issue twice, whenever there's empty bin set in the refine module, the program exist. Any tips where I should look into? An example of error message:

there are 27 bins in binsA                           
there are 33 bins in binsB     
There are 2 bin sets! 
Fix contig naming by removing special characters...  
BEGIN BIN REFINEMENT 
There are two bin folders, so we can consolidate them into a third, more refined bin set.

Specified 2 input bin sets: -1 binsA -2 binsB
Add folder/bin name to contig name for binsA bins
Add folder/bin name to contig name for binsB bins
Combine all bins together
The number of refined bins: 0
Exporting refined bins...
Deleting temporary files
All done!

there are 0 refined bins in binsAB
Bin refinement finished successfully! 
fixing bin naming to .fa convention for consistancy...  RUNNING CHECKM ON ALL SETS OF BINS

Running CheckM on binsA bins 

----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id            Marker lineage            # genomes   # markers   # marker sets   0     1    2    3   4   5+   Completeness   Contamination   Strain heterogeneity  
----------------------------------------------------------------------------------------------------------------------------------------------------------------
  bin.21          k__Archaea (UID2)              207         145           103        2    138   5    0   0   0       98.54            4.85              60.00          
  bin.12        k__Bacteria (UID1452)            924         163           110        2    159   2    0   0   0       98.18            1.21               0.00          
  bin.10        k__Bacteria (UID3187)            2258        187           116        5    177   5    0   0   0       97.14            4.31               0.00          
  bin.5         k__Bacteria (UID3187)            2258        181           110        5    172   4    0   0   0       96.82            3.18               0.00          
  bin.17        k__Bacteria (UID1452)            924         163           110        5    151   7    0   0   0       96.36            5.64              14.29          
  bin.18      p__Bacteroidetes (UID2591)         364         302           203        9    289   4    0   0   0       96.06            1.72               0.00          
  bin.4      f__Rhodocyclaceae (UID3972)          30         540           241        17   508   14   1   0   0       96.00            3.58               5.88          
  bin.8    c__Betaproteobacteria (UID3959)       235         413           209        27   371   14   1   0   0       95.64            3.87               0.00          
  bin.27        k__Bacteria (UID3187)            2258        188           117        8    172   8    0   0   0       95.09            1.99              37.50          
  bin.6         k__Bacteria (UID1453)            901         171           117        7    161   3    0   0   0       94.87            2.14               0.00          
  bin.16   c__Gammaproteobacteria (UID4274)      112         581           290        48   529   4    0   0   0       93.97            0.89               0.00          
  bin.7    c__Betaproteobacteria (UID3959)       235         419           211        28   388   3    0   0   0       90.82            1.18              66.67          
  bin.15        k__Bacteria (UID1452)            924         151           101        27   124   0    0   0   0       89.97            0.00               0.00          
  bin.22        k__Bacteria (UID3187)            2258        188           117        16   147   19   6   0   0       88.46            9.15               0.00          
  bin.24        k__Bacteria (UID2570)            433         270           179        29   236   5    0   0   0       86.09            2.25              20.00          
  bin.20        k__Bacteria (UID2982)             88         230           148        39   186   4    1   0   0       85.14            3.51               0.00          
  bin.1         k__Bacteria (UID1452)            924         151           101        24   124   3    0   0   0       84.65            1.98              33.33          
  bin.14        k__Bacteria (UID2495)            2993        147            91        31   111   5    0   0   0       84.52            4.40              20.00          
  bin.3         k__Bacteria (UID1452)            924         160           109        38   119   3    0   0   0       84.47            2.75              33.33          
  bin.23        k__Bacteria (UID3187)            2258        184           114        45   130   9    0   0   0       81.55            6.05              11.11          
  bin.9         k__Bacteria (UID3187)            2258        188           117        34   150   4    0   0   0       80.54            3.42               0.00          
  bin.25   c__Gammaproteobacteria (UID4202)       67         481           276        76   399   5    1   0   0       80.06            1.11              12.50          
  bin.11         k__Bacteria (UID203)            5449        104            58        49    52   3    0   0   0       76.90            5.17               0.00          
  bin.2          k__Bacteria (UID203)            5449        104            58        56    48   0    0   0   0       74.14            0.00               0.00          
  bin.13        k__Bacteria (UID2570)            433         270           179        57   213   0    0   0   0       73.74            0.00               0.00          
  bin.19   c__Deltaproteobacteria (UID3216)       83         247           155        85   158   4    0   0   0       73.67            2.26              25.00          
  bin.26        k__Bacteria (UID1452)            924         151           101        58    89   3    1   0   0       71.47            3.19              16.67          
----------------------------------------------------------------------------------------------------------------------------------------------------------------There are 27 'good' bins found in binsA! (>70% completion and <10% contamination) 

Running CheckM on binsAB bins
Controlled exit resulting from an unrecoverable error or warning.
Something went wrong with running CheckM. Exiting...

Many thanks!

ursky commented 5 years ago

If there are 0 refined bins, then you are probably not using the module correctly. Do the two bin sets come from the exact same assembly? In other words, do the contig names between the two sets match?

ursky commented 5 years ago

Seems like a duplicate issue with https://github.com/bxlab/metaWRAP/issues/144. My question applies.

ganiatgithub commented 5 years ago

Seems like a duplicate issue with #144. My question applies.

Thank you for the patience, I didn't check your reply to #144 before. Now it makes sense, my set A and B were obtained using metaspade, headers are like: >NODE_490_length_41621_cov_9.786749 etc; Set C was by megahit, with headers like >k141_168

I'll give it a go with DRep.

ursky commented 5 years ago

The refinement module is meant to consolidate different binning strategies of the same assembly, not different assemblies of the same data. Keeping open for visibility.