dyxstat / ViralCC

ViralCC: leveraging metagenomic proximity-ligation to retrieve complete viral genomes
GNU Affero General Public License v3.0
15 stars 3 forks source link

Final Bins and representative sequences #1

Closed poursalavati closed 10 months ago

poursalavati commented 1 year ago

Thank you very much for publishing this tool. I'm testing your tool and I'm wondering what you think about the final bins. I used this side script (concatenation.py) and it seems that it only concatenates sequences (as its name suggests).

But do you have a way to get a representative sequence for each Bin? For the next steps of analysis, such as taxonomy and checking the frequency of each virus, do you have any other way in mind?

Best, NP

dyxstat commented 1 year ago

Thanks for trying our software! This is an excellent question.

Briefly speaking, you can still annotate viral contigs; ideally, the vast majority of contigs from the same vMAG should be assigned the same taxonomy. Since this is a Hi-C-based viral genome binner, one of the most important downstream analyses is to link vMAGs to their hosts as we did in our paper.

Best, Yancey

poursalavati commented 1 year ago

Thanks for your replay, Yeah, make sense and I agree with you about taxonomy assignment. But I'm curious about abundance calculation after taxonomy and link them together. If there is an alternative way to make consequences or representative sequences from each bin.

Thanks again for sharing, NP

dyxstat commented 1 year ago

Though I did not calculate virus abundance before, I think the following materials might help you:

'Some genome clusters were excluded on the basis of promiscuous interactions with other clusters, quantified as their vertex entropy in the Hi-C graph connecting genome clusters, calculated using the R entropy package [25]. We excluded clusters with vertex entropy higher than 3. We assessed abundance of different organisms as the median of the abundance of constituent contigs >20 kb in size, estimated as kallisto “transcripts per million” '

from Linking the resistome and plasmidome to the microbiome, Stalder et al., ISME 2019.

Best