apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
191 stars 18 forks source link

Number of contigs in the assembly #117

Open coreyholt opened 2 months ago

coreyholt commented 2 months ago

Hello!

Thank you for developing genomad!

Following on from this #62 ...

I'm trying to understand how the number of contigs binned to each MAG affects the number of plasmid and virus classifications by genomad. I had presumed that a more discontiguous MAG would not equate to an overestimation of MBEs as the search is based on ORFs. Is this not the case?

Thanks for your help!

apcamargo commented 2 months ago

Not sure if I understood the question.

Assuming the binning is good (that is, with little contamination), having more sequence information (and, therefore, ORFs) would allow better classification. See the figure in the paper showing how classification performance increases as sequences get longer.