fmalmeida / bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
https://bacannot.readthedocs.io/en/latest/
GNU General Public License v3.0
96 stars 9 forks source link

problem with ResFinder annotation "merge" #64

Closed fmalmeida closed 1 year ago

fmalmeida commented 1 year ago

Since ResFinder is annotated using the contigs and not the CDS sequences, its annotations are added to the final GFFs by finding intersection between annotation GFF and Resfinder GFF.

The problem is that sometimes, this annotation, may find intersections with two CDS sequences, but only of them being the real one, and the other just some spurious intersection like the following:

contig_2        Prodigal:002006,Resfinder       CDS,Resistance  1885477 1886385 .       +       0       ID=GDOAGAFO_01938;Name=syrM1;gene=syrM1;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P55619;locus_tag=GDOAGAFO_01938;product=HTH-type transcriptional regulator SyrM 1;Resfinder_ID=Resfinder_1;Additional_database=Resfinder;Resfinder_gene=fosA;Resfinder_phenotype=Fosfomycin_resistance;Resfinder_reference=ACWO01000079

contig_2        Prodigal:002006,CARD,AMRFinderPlus,Resfinder    CDS,Resistance  1886379 1886798 .       +              0       ID=GDOAGAFO_01939;eC_number=2.5.1.18;Name=fosA;gene=fosA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:Q56415;locus_tag=GDOAGAFO_01939;product=Glutathione transferase FosA;Additional_database=KEGG;KO=K03321;Method=KOfamscan;Additional_database=CARD;CARD:Name=FosA6;CARD:Inference=protein_homolog_model;CARD:Product=fosfomycin_thiol_transferase;CARD:Targeted_drug_class=fosfomycin;Additional_database=NDARO;NDARO:Gene_Name=fosA;NDARO:Gene_Product=FosA5_family_fosfomycin_resistance_glutathione_transferase;NDARO:Resistance_Category=AMR;NDARO:Resistance_Target=FOSFOMYCIN;NDARO:Method=BLASTP;NDARO:Closest_Sequence=FosA5_family_fosfomycin_resistance_glutathione_transferase;Resfinder_ID=Resfinder_1;Additional_database=Resfinder;Resfinder_gene=fosA;Resfinder_phenotype=Fosfomycin_resistance;Resfinder_reference=ACWO01000079

Where the annotated Resfinder_1 was intersected to two CDS regions, and then, appears (wrongly) two times in the final gff.