Arkadiy-Garber / FeGenie

HMM-based identification and categorization of iron genes and iron gene operons in genomes and metagenomes
GNU Affero General Public License v3.0
54 stars 11 forks source link

Annotation collisions and confusion about HMM model names #14

Closed karoraw1 closed 4 years ago

karoraw1 commented 4 years ago

Hello Arkadiy,

I used FeGenie and I think it works very well and is simple to use. I am not sure where I can find more information about some of the HMM model calls. I checked the publication text and supplementary files, but can't find what the names DFE_0465 and DFE_0448 refer to.

This came up because I'm trying to reconcile some results between different annotation software. I've attached the few instances where KEGG and FeGenie annotations disagree. FeGenieCollisions.txt

I was able to find closely related sequences in NCBI refseq_protein using BLASTp, which i summarize below.

  1. BLASTp calls the proteins identified as FmnB as FAD:protein FMN transferase.
  2. BLASTp calls the proteins identified as DmkB and DmkA are a polyprenyl synthetase family protein and UbiA prenyltransferase proteins, which are what DmkA and DmkB are I think.
  3. BLASTp and KEGG agree that the proteins identified as DFE_0465 and DFE_0448 are the cytochrome c proteins shown in the attachment.

Any help on understanding how to reconcile these annotations and on where to find more information for the HMMs included in the software would be great.

Thanks, Keith

Arkadiy-Garber commented 4 years ago

Hi Keith,

Thanks for your comment, and for using FeGenie! Apologies if something is unclear in the program/publication. But I am happy to help reconcile the confusion.

With regard to the HMMs for FmnB, DmkA, and DmkB -- you can find more information about these genes in this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6221200/pdf/nihms-1502928.pdf. Based on what I have read in this article, your BLASTp annotations look correct. On their own, these genes are not necessarily indicative of iron-reduction. However, identification of them together, along with other components that are part of this flavin-based electron transport system, does indicate potential for extracellular electron transport or iron reduction. Does this make sense? What is the genomic context in which FeGenie identified these genes? Refer to Extended Data Figure 9 in the linked article for more info on potential operon structures.

For the DFE genes, these are taken from this article: https://advances.sciencemag.org/content/4/2/eaao5682. These DFE HMMs refer to the multi-heme cytochrome loci that are discussed in the linked article.

Hope this helps! Don't hesitate to reach out with further questions/comments. Arkadiy

karoraw1 commented 4 years ago

Thanks! Much appreciated.