arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
336 stars 78 forks source link

Customised Database for Specific Drugs #240

Closed pratibha-kadam closed 1 year ago

pratibha-kadam commented 1 year ago

Hello There, I have two questions to ask:

a. I am working with wastewater samples data and want to see presence of ARG in the metagenome samples. Which method I shall use either RGI main or RGI bwt? If RGI bwt do I also need to add wildcard data??

b. Can I create customised data for 5 specific drugs from CARD database. If yes can you please describe the steps for making local database.

Thanks in advance :) Pratibha

pratibha-kadam commented 1 year ago

Hello, I know you guys are super busy. However, I found the RGI tool extremely helpful and stood out for my research. I can see that the CARD database is there for all antibiotics except Bedaquiline(BDQ).

I want to add BDQ sequences as well in the database and create the local database.

Is it possible with RGI? If yes please let me know.

raphenya commented 1 year ago

Hi @pratibha-kadam,

a. I am working with wastewater samples data and want to see presence of ARG in the metagenome samples. Which method I shall use either RGI main or RGI bwt? If RGI bwt do I also need to add wildcard data??

I used rgi bwt to predict AMR genes from metagenome samples. See #233 and https://github.com/arpcard/asm_microbe_2019_rgi/blob/master/ASM_MICROBE_2019_POSTER_A_R_RAPHENYA.pdf

b. Can I create customised data for 5 specific drugs from CARD database. If yes can you please describe the steps for making local database.

We strive to add all AMR/ARG and drugs reported in the literature. Please send us any papers that we might have missed. I will ask my colleagues to check the Bedaquiline(BDQ). Cheers.

springwang24 commented 1 year ago

Hello @pratibha-kadam, thank you for your comment. We have curated the Rv0678 gene with mutations that confers resistance to BDQ as well as the atpE gene which prevents BDQ from binding to the atp synthase C subunit. Both of these will be available in CARD after our next release which we are hoping to have out around October. If you have found any more BDQ ARGs that we've missed, please send them our way and we would be happy to study and curate them if necessary!

Thanks!

raphenya commented 1 year ago

Thanks @springwang24

pratibha-kadam commented 1 year ago

Thank you so much @springwang24 @raphenya

It is very helpful for me.

Well for BDQ resistance I came across more 3 genes which are having resistance.

pepQ mmpS5 mmpL5 Rv1979c Also, we can get the mutations from the TBprofiler tools db json file which has cited mutations from papers and WHO catlog

Papers for your reference: https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(23)00002-2/fulltext - RV0678 and atpE https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8876534/ - for all genes https://www.frontiersin.org/articles/10.3389/fcimb.2022.807095/full

pratibha-kadam commented 1 year ago

Hello There,

I just wanted to ask a query, the sequences which I see in the database let's say for Rv0678 gene for BDQ resistance. When I matched it with the reference genome it aligned 100% with no mutations. So, I am not able to understand how you are adding the mutations in the reference sequences. Will get added later on in the alignment process??

Also, When we match the same sequence from the database in the online RGI Main tool, it does not show result for the gene.Lets say if I am matching NR_076151.1 gene having azithromycin resistance. In results it shows zero match found. Why is it so??

This will help me to understand the data of CARD and RGI to use them better.

Please Revert :)

Thanks

springwang24 commented 1 year ago

Hello Pratibha,

Thanks for your questions and totally understand your confusions. Yes, all sequences in CARD are the reference genomes with no mutations included. This is explained in the README for CARD (https://github.com/arpcard/FAQ#card-faqs): "The CARD does not contain complete sequences of resistant mutants, due to the fact the individual mutations are often reported in the literature without the complete mutant gene sequence being deposited in GenBank. Instead, the CARD maintains a complete list of all resistance SNPs relative to a reference sequence, which may either be a reported mutant sequence or a wild-type sequence. As such, it is important that SNP mapping be included in analysis of any genes that require mutation to confer resistance. This step is included in the Resistance Gene Identifier but not naive BLAST analyses. Computational predicted sequence variants are available in the Resistomes, Variants, & Prevalence section." In short, if you're sequence is matching 100% with the naive BLAST, your gene is sensitive because it is without mutations. If you're sequence includes mutations, you will need to use RGI because your sequence is resistant.

After troubleshooting with RGI, we've determined that CARD does not currently have the information to support your NR_076151.1 query. The only M. tub 23s entry we have is Mycobacterium tuberculosis 23S rRNA mutation conferring resistance to capreomycin. Otherwise, if you turn on loose hits in RGI, some other Mycobacterium results are available but none that are M. tub 23s conferring resistance to azithromycin. Another close hit is Mycobacterium intracellulare 23S rRNA with mutation conferring resistance to azithromycin. If you have a paper which supports M. tub 23s mutations conferring resistance to azithromycin, please feel free to send it here, so we can take a look and potentially curate it.

Hope this helps!

raphenya commented 1 year ago

@pratibha-kadam @springwang24 The NR_076151.1 query is indeed sensitive, it doesn't contain the mutation curated in CARD (i.e A2268C in 23s which results in resistance to azithromycin by Mycobacterium intracellulare) https://card.mcmaster.ca/ontology/41316