apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Recommendation for taxonomic classification of ssDNA virus #85

Closed ChenTianYi99 closed 3 months ago

ChenTianYi99 commented 3 months ago

Hi, i have used several virus identification tools to obtain a viral sequence set and applied CheckV for further validation. In following analyses, i just want to focus on ssDNA viruses. So are there any recommended parameters of genoMad for taxonomic classification of ssDNA viruses (should be more conservative)?

Looking forward to your reply. Thanks in advance.

apcamargo commented 3 months ago

If you already included geNomad for the identification, you can just get the assigned taxonomy from the {prefix}_annotate/{prefix}_taxonomy.tsv file. If not, you can just run genomad annotate to generate this file.

My only suggestion is to, if possible, increase the sensitivity of the marker search to try to increase the fraction of genomes with taxonomic assignment. You can do this via the -s parameter (e.g.: -s 7.2). Please note that increasing the search sensitivity will increase the memory usage.

ChenTianYi99 commented 3 months ago

Thanks for your rapid reply. Besides the sensitivity parameter, should i increase the minimum number of virus hallmarks in my virus set?

apcamargo commented 3 months ago

If you're confident that those sequences are viruses, I don't think that's necessary. The number of hallmarks is important for classification, but they don't really improve taxonomic assignment.

ChenTianYi99 commented 3 months ago

Thank you very much. It really helps a lot.

apcamargo commented 3 months ago

No problem!