apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

How IMG/PR database used in genomad #72

Closed Dx-wmc closed 5 months ago

Dx-wmc commented 5 months ago

Hi,

I recently explored the IMG/PR database published in the NAR journal and noted on the download page that it's compatible with genoMad. How should I use the IMG/PR database? What are the differences between it and the built-in genomad database?

apcamargo commented 5 months ago

Hi @Dx-wmc,

The IMG/PR database was generated using geNomad to identify plasmids in genomes and metagenomes from the IMG database. It is independent from the geNomad database.

Dx-wmc commented 5 months ago

Thank you for your explanation. In addition, I have an extra question about bacteriophages. Have you compared your tool with the phaster tool? I have compared the results of a complete genome, genomad, with the results of phaster. The two results are similar, but there may be some differences in terms of bacteriophage boundaries. And for incomplete genomes (when there are multiple contigs), the difference in results between the two is slightly larger. Therefore, I am unsure which one is more accurate. Can you provide some advice on this matter?

apcamargo commented 5 months ago

I won't be able to give you an answer supported by data, since I didn't benchmark PHASTER. My guess, based on how PHASTER works and the database it uses, is that geNomad will be significantly more sensitive and the boundary predicted might be a bit better. But this is all gut feeling, so you should be skeptical.

I recommend you to look at the data and make your decision based on your results (e g. number of predicted proviruses, average contamination and completeness, etc.)

Dx-wmc commented 5 months ago

Thanks for your advice!