arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
325 stars 76 forks source link

Question: technical definition of perfect, strict, and loose #140

Closed alimayy closed 2 years ago

alimayy commented 3 years ago

Hi there,

Thanks for developing RGI.

Regarding the perfect, strict, and loose hits, Jia et al. 2017 mentions:

The RGI currently supports two detection model types (Protein Homolog and Protein Variant) and analyzes sequences under three paradigms—Perfect, Strict, and Loose (a.k.a. Discovery). The Perfect algorithm is most often applied to clinical surveillance as it detects perfect matches to the curated reference sequences and mutations in CARD.

Could you provide a more technical definition of these terms? For instance for 'perfect', I understand from https://github.com/arpcard/rgi/issues/101 that the self-mapping bit-scores determine the 'perfect' hits. Could you provide more info on how 'strict' and 'loose' thresholds are chosen?

Thanks in advance, Ali

raphenya commented 2 years ago

@alimayy As an example, Blasting AXC-1 on the CARD database produces the results shown. The bit-score chosen will be 500, as it will capture genes similar to AXC-1.

Bitscore-AXC-1

parwa28 commented 9 months ago

Can you please tell me how exactly I can use one of the detection models?

agmcarthur commented 5 months ago

@alimayy these detection models are used by CARD's Resistance Gene Identifier software: https://github.com/arpcard/rgi (contains full documentation). There is an online portal at https://card.mcmaster.ca/analyze/rgi. Also described in CARD's latest publication: https://pubmed.ncbi.nlm.nih.gov/36263822/