dyxstat / ViralCC

ViralCC: leveraging metagenomic proximity-ligation to retrieve complete viral genomes
GNU Affero General Public License v3.0
15 stars 3 forks source link

Using multiple bam files for viral genome binning #2

Closed mshamash closed 10 months ago

mshamash commented 1 year ago

Thank you for making this tool, it's been a great help in analyzing our Hi-C datasets.

I was wondering if there is any way we can use multiple bam files for the viral genome binning? In our case, we conducted a coassembly with MEGAHIT on multiple longitudinal samples from the same environment. I ran ViralCC using the bamfile from each sample, but with the contigs from coassembly, however the viral bins have different names which makes it difficult to dereplicate/merge bins.

dyxstat commented 1 year ago

Thanks for trying our method! However, currently, ViralCC does not allow viral contig binning using multiple bam files.

How about merging all bam files into one as the input? I think it does not hurt for the binning.

Best

mshamash commented 1 year ago

Thanks for the quick reply! That makes sense, I will give it a try and see how it goes.

Would a similar approach work for your other tool, HiCBin, as well? I can also open a new issue on that repo... I see you use coverage information (with the jgi_summarize_bam_contig_depths script) as part of the binning process, and am not sure if merging all my reads from the coassembly could skew or affect this...

Cheers.

dyxstat commented 1 year ago

Yes, your concerns make sense. When we designed HiCBin, we focused on contig binning using a single sample since almost all metagenomic Hi-C datasets only contained one sample at that time. My gut feeling is that HiCBin cannot be applied to multiple samples though you can still have a try.

You might also try bin3C instead which does not require the coverage information for binning.

Best