cmmr / EsViritu

Read mapping pipeline for detection and measurement of virus pathogens from metagenomic or clinical data
MIT License
21 stars 3 forks source link

Add instructions for custom virus database generation #1

Open mtisza1 opened 1 year ago

mtisza1 commented 1 year ago

Instructions should include format for .fasta files, .mmi files, and metadata files

mherold1 commented 2 months ago

Hi, thanks for providing the software. I was wondering if there are any updates in regards to information on custom database generation.

From looking at the DB v2.0.2 I would simply try to modify the files accordingly (descriptions of the files in the zenodo archive: https://zenodo.org/records/7876309): Is the list of curated viruses contained in the database simply the list of viruses for which sequences are not dereplicated in the initially constructed database or within running the pipeline?

With the output I got from running pipeline v.0.2.3 and DB v2.0.2 it seems like reads are assigned across a lot of different Mamastrovirus and Rotavirus strains and segments. Maybe prior dereplication would have given better results for my samples and I would like to test a different database.