arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
322 stars 76 forks source link

[BUG] hit reported at 100% but is completely different #289

Open bifidoman opened 1 month ago

bifidoman commented 1 month ago

Describe the bug Hi I run RGI locally v5.2.1 (Ubuntu 22.04) Database Version CARD: card_canonical: 3.2.5

Input I ran standard setting (rgi main -i file -o ./rgi_out/file --local;

Input assembled genome

I got a hit with 100% identity: Best_Hit_ARO Best_Identities ARO Model_type SNPs_in_Best_Hit_ARO Other_SNPs
AMR Gene GRD33-1 100 3006926 protein homolog model n/a n/a carbapenem antibiotic inactivation

These are the proteins:

Predicted_Protein MSEAKNSWVTASDVARLAGVSRSAVSRTFTPGASVSEKTRQRVQAAATELGYQVNIIARSMITGSSNFIGLVTAGFDNPFRSKLLAPLAHNLAIQGFMPLLMNADDPKQLEPQLRELLSYHVAGVILTSGAPPLSLAEEYLARKIPVTLINRQTELDGADQVCSDNAQGATLAAHHLLAQGVTVAGFIGENAHNFSTRQRHQGFEQALTDHGQPLASIFCERGGYEAGWDAAAALVAQCPDLDGLFCATDMLAMGAMDYLHRHQPQQPVRIIGFDDIPQATYAAYQLTTIRQDTDCLAQTAVNLLVNRIRRFEQPSVQKTIPVELVVRQSA CARD_Protein_Sequence MAAMAAVAAVLLGVFAFAHAQDQPALWTQPQQPVRIIGNAWYVGTRGLSAILITSPTGAVLIDGAMRESADDIAKNITSLGVRLEDVKLIVNSHAHNDHAGGIAELQRRTGATVAALPWSAEALRSGRKHQGDPQFDTQTPPPDRVPKVKTIRDGEALHAGGVTITAHKTGGHTPGSTSWTWRSCEENRCVDIVYADSITAVSADGFRFTDNKTYPQAIDDFNKGYAFLRSASCDILVTPHPEASDFWGRIAKRDAGERDALIDRSQCARYADRADAQLQKRLATERAK

Completely different!

Aligned it looks like this:

Predicted_Protein          MSEAKNSWVTASDVARLAGVSRSAVSRTFTPGASVSEKTRQRVQAAATELGYQVNIIARS
CARD_Protein_Sequence      ------MAAMAAVAAVLLGVFAFAHAQD---QPALWTQPQQPVR----------------
                                   . *: .* * **   * :.     .::  :..* *.                

Predicted_Protein          MITGSSNFIGLVTAGFDNPFRSKLLAPLAHNLAIQGFMPLLMNADDPKQLEPQLRELLSY
CARD_Protein_Sequence      -IIGNAWYVG--TRGLS----AILITSPTGAVLIDGAMR--ESADD--------------
                            * *.:.::*  * *:.    : *::. :  : *:* *    .***              

Predicted_Protein          HVAGVILTSGAPPLSLAEEYLARKIPVTLINRQTELDGADQVCS-DNAQGATLAAHHLLA
CARD_Protein_Sequence      -IAKNITSLG---VRLEDVKL-------IVNSHAHNDHAGGIAELQRRTGATVAALPWSA
                            :*  * : *   : * :  *       ::* ::  * *. :.. :.  ***:**    *

Predicted_Protein          QGVTVAGFIGENAHNFSTRQ-------RHQGFEQALTDHGQPLASIFCERGGYEAG---W
CARD_Protein_Sequence      EALR-SGRKHQGDPQFDTQTPPPDRVPKVKTIRDGEALHAGGVTITAHKTGGHTPGSTSW
                           :.:  :*   :.  :*.*.        . : : :. : *.  ::    : **: .*   *

Predicted_Protein          DAAAALVAQCPDLDGLFCATDMLAMGAMDYLHRHQPQQPVRIIGFDD---IPQATYAAYQ
CARD_Protein_Sequence      TWRSCEENRCVD---IVYADSITAVSADGFRFTDNKTYPQAIDDFNKGYAFLRSASCDIL
                              :.   .* *   :. * .: *:.* .: .  :   *  * .*:.   : .:: .   

Predicted_Protein          LTTIRQDTDCLAQTAV------NLLVNRIR--RFEQPSVQKTIPVELVVRQSA
CARD_Protein_Sequence      VTPHPEASDFWGRIAKRDAGERDALIDRSQCARYADRA-DAQLQKRLATERAK
                           :*.  : :*  .. *       : *::* .  *: : : :  :   *.. .: 

There is not a 100 identity, not even over > 3 AA. Why is this a hit at 100%?

original output SK65-2.txt