Ecogenomics / CheckM

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes
https://ecogenomics.github.io/CheckM/
GNU General Public License v3.0
335 stars 73 forks source link

ssu_finder not showing whether ssu in a bin or not #211

Closed nixon444 closed 5 years ago

nixon444 commented 5 years ago

Hi,

I have binned genomes from 6 metagenomes using metabat2 and maxbin2 and chosen the latter based on better completion / contamination results. I want to use ssu_finder to identify who these genomes belong to (and who wasn't binned). I have had success using this on maxbin2 bins before, but for some reason the results are not reporting each ssu as 'binX' or 'unbinned'. Instead I just get 'G6_scaffolds' which is the name of the assembled metagenome.

Any idea why this is happening? I checked the assembly file and each contig has a unique name. I tried ssu_finder on the metabat2 bins recovered from this same assembly and the results table gave me 'binX' or 'unbinned' in the first column, so I don't think it's an issue with the assembly file. I guess I could blass these SSUs back to my assembly and bins to figure that out but it's a lot of extra work.

Any help with this is much appreciated!

Cheers, Sophie

donovan-h-parks commented 5 years ago

Hello. If you can send me a few bins that have this issue I can look into what might be causing the problem.

nixon444 commented 5 years ago

Thanks - the attached archive should contain three high-quality (>99% complete <5% contam) genome bins. Let me know if you have trouble retrieving the files (it wouldn't let me attach fasta format)

[Uploading Archive.zip…]

nixon444 commented 5 years ago

Dropbox link to same attachment in case upload failed: https://www.dropbox.com/s/no3b3gmgmcx2827/Archive.zip?dl=0

donovan-h-parks commented 5 years ago

Hello. Seems to be working for me. The ssu_finder method expects the contig assembly file for your metagenome and the directory containing the bins/MAGs recovered from the assembly file. I extracted your 3 bins to a directory called genomes. To create a fake contig assembly fileI first all your bins: cat ./genomes/*.fasta > all_seqs.fasta. I than ran checkm ssu_finder all_seqs.fasta ./genomes/ ./ssu_finder_output -x fasta. This output of the ssu_summary.tsv file indicates a single SSU sequence:

Bin Id  Seq. Id HMM     i-Evalue        Start hit       End hit 16S/18S gene length     Rev. Complement Sequence length
G6.003  NODE_121652_length_1156_cov_424.143506  bacteria        8.7e-96 769     1156    387     True    11
nixon444 commented 5 years ago

OK so it must be the assembly file some how. I only need to know who my genomes belong to, so I can concatenate all bins per sample and do as you did. Thanks for trying this out, much appreciated!