arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
319 stars 76 forks source link

Can/should WildCARD be used for protein FASTA sequence searches? #233

Closed csmiller closed 11 months ago

csmiller commented 1 year ago

First, thanks for a great tool and a great resource.

We are annotating already inferred protein sequences from metagenome-assembled genomes. I notice that, at least in 2021 (issue #156), rgi main run in protein mode could only search the canonical CARD database, and not the Resistomes & Variants (WildCARD) data.

With the most recent versions of RGI and CARD/WildCARD Is it possible and/or advisable to also search protein sequences against the WildCARD database? I am wondering if this would increase sensitivity in a metagenomic protein dataset in particular, without increasing false positives too much.

If possible, what would the appropriate rgi load command look like before running rgi main -t protein ... ?

If not advisable and/or possible, if not too much trouble could you briefly elaborate on why, please?

Thanks!

raphenya commented 11 months ago

@csmiller Currently we only use wildcard or card variants with the rgi bwt program. Yes, the variants do improve predictions of AMR genes in metagenomes, at least that was our conclusion in this poster https://github.com/arpcard/asm_microbe_2019_rgi/blob/master/ASM_MICROBE_2019_POSTER_A_R_RAPHENYA.pdf