caozhichongchong / arg_ranker

MIT License
24 stars 11 forks source link

Enhancing Antimicrobial Resistance Prediction: Integrating Mobile Genetic Element Data with ARG Ranker #15

Closed mathavanpu closed 5 months ago

mathavanpu commented 1 year ago

Hi there, The goals of my project involve predicting both Antibiotic Resistance Genes (ARG) and Mobile Genetic Elements (MGEs). Currently, the ARG Ranker database utilizes the SARG database for ranking Antimicrobial Resistance (AMR). I am interested in incorporating the Mobile Genetic Element database (available at https://mobileogdb.flsi.cloud.vt.edu/entries/database_download) into the existing SARG.db.fasta database. Is there a method to construct a customized database for this purpose? I would greatly appreciate your insights on this matter.

Thanks Mathavan M

caozhichongchong commented 1 year ago

Hi Mathavan M,

Good question!

Depending on your sequencing data, I suggest annotate MGEs and ARGs independently in assembled contigs, and then search for MGEs adjacent to ARGs.

I'm available for further discussion.

Best regards, Anni

mathavanpu commented 1 year ago

Dear Anni, Thank you for your response. I'm looking for further clarification on the next steps. The MobileGeneticElementDatabase (MGE) provides MGE data in protein sequence format, which can be found (https://github.com/KatariinaParnanen/MobileGeneticElementDatabase/blob/master/MGEs_FINAL_99perc_trim.fasta.tar.gz). My goal is to merge this data with the SARG database to obtain abundance information for both Mobile Genetic Elements (MGEs) and Antimicrobial Resistance Genes (ARGs).

I would greatly appreciate it if you could provide detailed guidance on how to proceed with these next steps.

caozhichongchong commented 1 year ago

I see!

To utilize arg_ranker with another database, it might work if you 1) Merge the MGE data (protein sequence) with SARG.db.fasta

2) Supplement the necessary details for each sequence in the files found in the data folder, including SARG.db.fasta.length, SARG.structure.txt, and ARG_rank.txt, while maintaining the current format.

For instance, in SARG.structure.txt use "MGE_protein_1 MGE MGE"; in SARG.db.fasta.length use "MGE_protein_1 391" (number of amino acids in MGE_protein_1); and in ARG_rank.txt use "MGE_protein_1 MGE MGE MGE MGE MGE MGE notassessed".