KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

General Query on improvising Metagenome binning #268

Closed srisvs33 closed 2 years ago

srisvs33 commented 2 years ago

Dear Developers, Greetings from India,

Thank you for providing such detailed documentation for the users. Being a novice to the Bioinformatics world, even I was able to install the S/w and execute bash workflow with ease. I have some basic general queries for which I am hoping to get some insights.

We recently procured Iontorrent genestudio S5 for our lab and did the first sequencing run using a Mock DNA mix containing 8 bacteria and 2 yeast (Zymobiomics, catno:D6306). We got around 45 million good-quality reads. I did metagenome assembly using SPAdes and followed by read mapping (alignment rate: 95%, N50=32088). I performed binning in Autometa using BASH workflow. The analysis reconstructed 12 genome bins while using default settings (Pls see attachment). However, the completeness of those genome bins looks very low (Except 1) and few bacterial genomes were not recovered through binning. Is it possible to recover a more complete genome by using advanced settings options (If so which one do you think I should change) or simply 45 million reads are not sufficient enough to bin a more complete genome.

Any recommendations or advice would be very much helpful in this regard.

Many thanks Venkat

Mock_contigs_bacteria_metabin_stats.txt Mock_contigs_bacteria_metabin_taxonomy.txt

Sidduppal commented 2 years ago

Hey @srisvs33 gald to hear that you found the documentation useful. Default settings are useful for the majority of the cases. After looking at Mock_contigs_bacteria_metabin_taxonomy.txt it seems that almost all of your bins belong to the same genus and sometimes even the same species. Autometa clusters contigs based on k-mer coverage (which is pretty similar for genomes belonging to similar taxonomy), taxonomy and coverage. In case the genomes belonging to similar genus or species k-mer coverage and taxonomy doesn't help a lot in clustering as there is no significant difference between the genomes except for the coverage. This is one of the limitations of Autometa and we are activity working on improving it.

If you want to get more complete bins you can try increasing the completeness metric. Furthermore, you can also try to manually curate your genomes using Automappa which is a tool developed by one of the students in our lab (@WiscEvan ). Feel free to let us know in case of further questions 😄

srisvs33 commented 2 years ago

Hi @Sidduppal . Thank you for your response. Actually sequenced Mock community DNA mix consists of 8 bacterial sp. ( Pseudomonas aeruginosa , Escherichia coli , Salmonella enterica , Lactobacillus fermentum, Enterococcus faecalis , Staphylococcus aureus ,Listeria monocytogenes ,Bacillus subtilis ) and 2 Yeast (Saccharomyces cerevisiae , Cryptococcus neoformans). It looks like Autometa splitting the genome bins into smaller ones even though they are actually from same species. I am not sure why it is the case. Is there any way to avoid this. Many thanks. Venkat

Sidduppal commented 2 years ago

Hey @srisvs33 , thanks for providing additional information. Escherichia coli, Salmonella enterica and Enterococcus faecalis all belong to the same family, Enterococcaceae. There might be some problem in binning them correctly as they are closely related and there's higher chance of them resulting in split bins. If you look at the Mock_contigs_bacteria_metabin_stats.txt you'll see that the only bins which has good completeness (bin1) belongs to completely different phylum from all the other bins (ie. proteobacteria), this is what I was mentioning, that genomes belonging to different taxonomic groups bin out well. Unfortunately, autometa is only for bacteria and archae, thus it won't bin the yeast genomes. A few things can be tried to improve the binning:

  1. Since your assembly quality is good you can increase the length_cutoff to 10,000. This has shown to reduce the noise and improve binning.
  2. If you know that your genomes are not reduced you can try increasing the completeness metric
  3. I encourage you to combine the bins that autometa split using Automappa

Feel free to let us know in case of further questions 😄

srisvs33 commented 2 years ago

Dear @Sidduppal Thank you for your suggestions, Increasing the length cutoff certainly helped to get more complete genomes (> 90%) for up to 4 bacterial species. Cheers

srisvs33 commented 2 years ago

Dear @Sidduppal, sorry to bother you again. Another Quick query, is it possible to provide multiple assemblies for binning in Bash workflow? regards

Sidduppal commented 2 years ago

As of yet, it's not possible to provide multiple assemblies to bash workflow, however, you can create a simple bash loop and do some string manipulation to achieve that. Another way to provide multiple assemblies is to use the nextflow workflow where you can not only provide multiple assemblies but run them in parallel.