AnantharamanLab / vRhyme

Binning Virus Genomes from Metagenomes
GNU General Public License v3.0
58 stars 10 forks source link

No bins created-- did we do something wrong or are there really no bins? #37

Open rikander opened 1 month ago

rikander commented 1 month ago

Hi Kris (and Karthik),

My student is trying to bin contigs identified by VirSorter using vRhyme. He is using as input contigs identified by VirSorter as viral, as well as sorted bam files that were mapped to the entire set of of contigs (which includes the ones that Virsorter did not flag as viral). (For reference, we had approximately 2-3 million contigs for each sample, but VirSorter identified about 10,-15,000 contigs as viral per sample. These samples were microbial metagenomes.) He ran vRhyme for 48 samples and vRhyme ran without errors, but it tells us we have no bins for any of the samples. Is this realistic, or did something likely go wrong? If so, do you have any thoughts?

Thanks! -Rika

KrisKieft commented 1 month ago

Hi,

That sounds a bit off. Did you take the set of 10-15k contigs and do one binning run using 48 samples? Versus 48 binning runs with 1 sample each. The former (1 run, 48 samples, set of dereplicated contigs) is the correct usage. Were any parameters changed from their default settings?

Kris

rikander commented 1 month ago

Hi Kris,

OK, so to clarify: we didn't do a co-assembly (that would break our server), so we have 48 separate sets of contigs. Should we combine those contigs into one dereplicated combined fasta file (which would contain something like 500k contigs) and do a vRhyme run on that? For the bam files then, should we map the reads of each sample against that combined fasta file?

We used default settings for all the vRhyme runs.

Thanks! Rika

KrisKieft commented 1 month ago

It's possible that using 1 sample (1 coverage value) per contig didn't give vRhyme enough information to bin. It semi-equally uses coverage and sequence features. I've gotten 1 to work before but certainly not the same quality results. My suggestion is to dereplicate your 500k viral contigs and use the dereplicated set as contig input. Yes, then map the the reads of each sample. There's a couple ways to do that. vRhyme can handle dereplication, or otherwise it uses a general method similar to what dRep uses. Then you can either have vRhyme map by just inputting the fastq files (select either BWA or Bowtie2) or you can map yourself and input the bam files.

This complicates things if you wanted vMAGs per sample to compare because at the end of binning you'd have combined vMAGs based on the dereplicated/combined set. For this vRhyme will generate a coverage file and you can assess coverage per contig per sample. However, as you know each of your samples invidually won't have the whole picture anyway due to variance in metagenome sequencing/assembly.

I hope that answers your question. The main takeaway is that vRhyme and other coverage-based tools often rely on >1 sample to bin accurately even though they tend to let you input 1.

rikander commented 1 month ago

Hi Kris,

OK, thanks! The first time we did it, we did have multiple coverage values for each sample (i.e. bam files for sample 1 mapped to sample 2, and sample 3, and sample 4...) but still found no bins in any of the samples. We'll still give this a try, so we'll have more contigs to work with in the single binning run-- so we'll combine all the assembled contigs together and make new bam files. We'll see how it goes.

Thanks, Rika

rikander commented 4 days ago

Hi again Kris et al.,

We tried what you suggested (combine all fasta files together, dereplicate, map reads from each sample to that combined set) and tried to bin with vRhyme-- and this time, it didn't even try to bin. Any ideas for what might be going on this time? For reference, here is the log file:

Command:  /usr/local/miniconda3/bin/vRhyme -i ../VirSorterRuns2/renamed_fastas/Combined_stuff/combined-ports-dereplicated.fa/vRhyme_dereplication/combined_ports.vRhyme-unique.fa -b ../Combined_mapping/Port10_2018_vs_combined.bam ../Combined_mapping/Port10_2020_vs_combined.bam ../Combined_mapping/Port11_2018_vs_combined.bam ../Combined_mapping/Port11_2020_vs_combined.bam ../Combined_mapping/Port1_2018_vs_combined.bam ../Combined_mapping/Port1_2020_vs_combined.bam ../Combined_mapping/Port12_2018_vs_combined.bam ../Combined_mapping/Port12_2020_vs_combined.bam ../Combined_mapping/Port13_2018_vs_combined.bam ../Combined_mapping/Port13_2020_vs_combined.bam ../Combined_mapping/Port14_2018_vs_combined.bam ../Combined_mapping/Port14_2020_vs_combined.bam ../Combined_mapping/Port15_2018_vs_combined.bam ../Combined_mapping/Port15_2020_vs_combined.bam ../Combined_mapping/Port17_2018_vs_combined.bam ../Combined_mapping/Port17_2020_vs_combined.bam ../Combined_mapping/Port18_2018_vs_combined.bam ../Combined_mapping/Port18_2020_vs_combined.bam ../Combined_mapping/Port19_2018_vs_combined.bam ../Combined_mapping/Port20_2018_vs_combined.bam ../Combined_mapping/Port20_2020_vs_combined.bam ../Combined_mapping/Port21_2018_vs_combined.bam ../Combined_mapping/Port21_2020_vs_combined.bam ../Combined_mapping/Port2_2018_vs_combined.bam ../Combined_mapping/Port2_2020_vs_combined.bam ../Combined_mapping/Port22_2018_vs_combined.bam ../Combined_mapping/Port22_2020_vs_combined.bam ../Combined_mapping/Port23_2018_vs_combined.bam ../Combined_mapping/Port23_2020_vs_combined.bam ../Combined_mapping/Port24_2020_vs_combined.bam ../Combined_mapping/Port3_2018_vs_combined.bam ../Combined_mapping/Port4_2018_vs_combined.bam ../Combined_mapping/Port5_2018_vs_combined.bam ../Combined_mapping/Port5_2020_vs_combined.bam ../Combined_mapping/Port6_2018_vs_combined.bam ../Combined_mapping/Port6_2020_vs_combined.bam ../Combined_mapping/Port7_2018_vs_combined.bam ../Combined_mapping/Port8_2018_vs_combined.bam ../Combined_mapping/Port8_2020_vs_combined.bam ../Combined_mapping/Port9_2018_vs_combined.bam ../Combined_mapping/Port9_2020_vs_combined.bam -o Ports_vs_combined_bins -t 20

Date:     2024-11-17 (y-m-d)
Start:    11:38:42   (h:m:s)
Program:  vRhyme v1.1.0

Time (min) |  Log                                                   
--------------------------------------------------------------------
0.0           Initializing and validating vRhyme parameters
0.57          Extracting coverage information from BAM files
379.77        Coverage extraction complete. Generating coverage table
379.79        Performing pairwise coverage comparisons
398.62        Running Prodigal on filtered sequences
400.89        Generating codon usage features
400.89        Generating nucleotide features
401.46        Performing pairwise distance calculations
401.65        Performing machine learning classification

Here is a sample log from the first time we tried it, where we tried to bin each individual sample and got zero bins:

Command:  /usr/local/miniconda3/bin/vRhyme -i ../VirSorterRuns2/PPS_megahit_Port24_2020.virsorter/final-viral-combined.fa -b ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port10.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port11.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port12.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port13.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port14.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port15.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port17.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port18.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port1.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port20.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port22.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port23.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port2.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port5.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port6.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port8.bam ../../data/NSF_CAREER_Axial_data/PPS_metagenomes_fall2022/PPS_mapping_2018_2020/2020_Port24/PPS2020_Port24_vs_2020Port9.bam -o vRhyme_Port24_2020 -t 10 --iter 10

Date:     2024-10-16 (y-m-d)
Start:    15:50:05   (h:m:s)
Program:  vRhyme v1.1.0

Time (min) |  Log                                                   
--------------------------------------------------------------------
0.0           Initializing and validating vRhyme parameters
0.11          Extracting coverage information from BAM files
299.6         Coverage extraction complete. Generating coverage table
299.6         Performing pairwise coverage comparisons
299.6         vRhyme binning complete

Memory usage:       1.25
Runtime (min):      299.6
Bins generated:     0
Binned sequences:   0 (0%)
Input sequences:    2090
Binned proteins:    0
Redundant proteins: 0 (0%)
Best iteration:     none
vRhyme score:       none

______________________________________________________________________

             ## ## ## ##                                              
             ##       ##  ##      ##     ##    ## ## ##     # ## ##   
##       ##  ##       ##  ##       ##    ##  ##   ##   ##  ##      #  
 ##     ##   ##     ##    ##         ## ##   ##   ##   ##  ## ## ##   
  ##   ##    ## ####      ## ## ##     ##    ##   ##   ##  ##         
   ## ##     ##   ##      ##    ##    ##     ##   ##   ##  ##         
    ###      ##     ##    ##    ##   ##      ##   ##   ##   ## ## ##  
______________________________________________________________________