BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
119 stars 48 forks source link

CheckM evaluation of CONCOCT-test-data-0.3.2/ #240

Closed franciscozorrilla closed 4 years ago

franciscozorrilla commented 5 years ago

Merge clustering & extract fasta bins:

merge_cutup_clustering.py concoct-output/clustering_gt1000.csv > clustering_gt1000_merged.csv
mkdir -p BINS #Note: extract_fasta_bins.py throws error if output folder does not already exist
extract_fasta_bins.py --output_path BINS contigs/velvet_71.fa clustering_gt1000_merged.csv

Results in four bins:

image

Run CheckM on the cluster:

#!/usr/bin/env bash
#SBATCH -A SNIC2018-3-18
#SBATCH -N 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 20
#SBATCH -C MEM64|MEM128
#SBATCH -t 2-00:00:00

checkm lineage_wf -x fa -t 20 --pplacer_threads 20 BINS CHECKM_OUT

Output summary:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id         Marker lineage         # genomes   # markers   # marker sets   0     1    2   3   4   5+   Completeness   Contamination   Strain heterogeneity  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  0        p__Bacteroidetes (UID2605)      350         314           208        5    309   0   0   0   0       98.46            0.00               0.00          
  2        o__Clostridiales (UID1212)      172         261           147        5    256   0   0   0   0       98.05            0.00               0.00          
  3               root (UID1)              5656         56            24        56    0    0   0   0   0        0.00            0.00               0.00          
  1               root (UID1)              5656         56            24        56    0    0   0   0   0        0.00            0.00               0.00          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Complete output:

slurm-3836984.txt

Results show only 2 good bins, perhaps something went wrong with the merging or extracting bins?

franciscozorrilla commented 5 years ago

CheckM output data folder:

CHECKM_OUT.zip

franciscozorrilla commented 5 years ago

CONCOCT bins:

BINS.zip

alneberg commented 4 years ago

Sorry for not responding to this issue, I think you've already figured out that this is perfectly normal. The bins 1 and 3 are tiny as can be seen by the size column in your file list and does not contain any of the marker genes so therefore they do not come out as "good" bins.