Closed Cateline closed 1 month ago
Assigned this to @AbhirupaGhosh (primary) and @epbrenner @the-mayer (secondary).
Abhirupa/Evan/David, along with the script, could you also check if this is the right file format we want to use? Thanks!
Refer to this comment for guidance.
Title: Process CARD Data, Map Short Names, and Run MolEvolveR
- Download CARD Data: Retrieve the latest CARD dataset. (DOWNLOAD)
- Open ARO_index.tsv: Parse the file (in R).
- Map CARD Short Name: Map the CARD Short Name column to shortname_antibiotics.tsv and shortname_pathogens.tsv. The CARD Short Name values follow the format pathogen_gene or pathogen_gene_drug.
- Sort and Group the data by pathogens and antibiotics.
- Filter Favorite Bug-Drug or Bug for further analysis.
- Download FASTA Sequences for the list of protein accessions filtered. (use Entrez)
- Run MolEvolvR: Run the protein sequences through the MolEvolvR tool for evolutionary analysis.
Description
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.