Open erinyoung opened 1 year ago
Hi,
I have added a Python script that:
Downloads and parses emm sequences from CDC's SFTP server.
Generates a multi-FASTA file containing all emm sequences.
Optionally creates a BLAST database from the multi-FASTA file, which can be used as input for emmtyper.
It can be accessed here: https://github.com/Daniel-VM/cdc-utilities
@Daniel-VM , thank you for your script! Forgive me for taking so long to try it out.
Hi @erinyoung,
I recently discovered that the CDC has uploaded a multifasta file containing all emm sequences, which simplifies things considerably. Now, we just need to periodically download the CDC multifasta and build the BLAST database. I recommend using their blastdb version included in the Singularity image available here: emmtyper:0.2.0--py_0 in the Galaxy repository.
I hope this helps!
Hi @Daniel-VM. Could I just confirm that the right multifasta to use is the alltrimmed.tfa
from https://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm/. rather than the untrimmed version that the CDC also offers.
Thanks in advance!
Hi @JamesZlosnik,
I recently noticed the file you mentioned. In my opinion, alltrimmed.tfa is the file we should use to build the BLAST database for emmyper.
I'm planning to update the script I mentioned above with alltrimmed.tfa
and run a few tests.
Hi! I'd like to use emmtyper on some group A strep, but I'm foggy as to how often the database is updated.
Is there a way to update it on my end?