apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Inquiry on threshold #61

Closed songmj86 closed 6 months ago

songmj86 commented 6 months ago

Hi

I would like to ask the proper threshold to define the plasmid, virus, and chromosme?

Thank you !

apcamargo commented 6 months ago

Hi @songmj86,

There's no straightforward answer to your question, unfortunately. It really depends on your goal (i.e. how conservative do you want/need to be).

If you use the score calibration feature, scores will be very close to probabilities, so you can set interpretable cutoffs (e.g., at least 90% probability of being a plasmid or virus).

Otherwise, the default cutoffs work pretty well. geNomad already applies some filters to try to avoid some obvious mispredictions. If you are interested in viruses, can also couple geNomad with CheckV to retain only medium/high-quality genomes.

songmj86 commented 6 months ago

Thank you for your kindness!