cultivarium / MicrobeMod

A toolkit for exploring prokaryotic methylation and base modifications in nanopore sequencing
MIT License
36 stars 1 forks source link

HMM gene coverage #4

Closed Ge0rges closed 10 months ago

Ge0rges commented 10 months ago

Hello,

You look for and identify RM genes by using hmmer with HMMs from Defense Finder (for type 1,2,3,4 systems), and a set of 12 HMMs from PFAM. You then assign REBASE homology to the significant hits using BLAST.

I was wondering if your selection of HMMs cover as many prokaryotic genes as are identified in REBASE? I ask because I notice that you do not use an HMM built from the REBASE database itself, which you state is the most comprehensive.

alexcritschristoph commented 10 months ago

I think the HMMs from DefenseFinder are pretty sensitive for this. As a quick check, they hit 93% of all proteins in REBASE. I actually have no idea how REBASE does their enzyme annotation, but it's not necessarily a ground-truth into itself in that regard.