arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
319 stars 76 forks source link

How to interpret duplicate output rows and negative percent identity (rRNA gene variant model) #237

Closed danrlu closed 12 months ago

danrlu commented 1 year ago

We can't tell whether this is a bug or it is intended and it's us not interpreting the results properly.

Describe the bug/confusion

Input rgi main -i contigs.fasta -o contig_amr_report -t contig -a BLAST --include_nudge --clean

Input file https://github.com/danrlu/debug_data/tree/main/rgi

Error log No error.

CARD Version RGI version latest version of rgi CARD 3.2.6 (same result when uploading the contig.fasta to CARD website)

Thanks very much for this great tool and database and continuous improvements!!!

Originally noticed by @lvreynoso.

raphenya commented 1 year ago

Correct, 8 of the contigs hit model Neisseria gonorrhoeae 23S rRNA with mutation conferring resistance to azithromycin.

For the rRNA gene variant model the ORF_ID is named with the model and the query identifier .e.g., 3646_5918 | model_type_id: 40295 | pass_bit_score: 5300 | SNP: C2600T,A2145G,C2597T,A2045G,A2059G,C2611T | Neisseria gonorrhoeae 23S rRNA with mutation conferring resistance to azithromycin | QUERY: NODE_26_length_1251_cov_840.666388 NODE_26_length_1251_cov_840.666388

The Percentage Length of Reference Sequence calculation is incorrect, which we will fix. Cheers.