broadinstitute / viral-ngs

Viral genomics analysis pipelines
Other
189 stars 67 forks source link

Add convenience functions for creating kraken, diamond, bwa databases #519

Open tomkinsc opened 8 years ago

tomkinsc commented 8 years ago

We should have convenience functions for building kraken, diamond, and metagenomic bwa databases, accessible as CLI subcommands of metagenomics.py, similar to the taxon_filter.py commands lastal_build_db, blastn_build_db, and bmtagger_build_db.

The tools/kraken.py file provides a build() function as a member of the Kraken class, but this should be exposed as a script command. The argument parser should specify the inputs needed for a custom DB, and handle any pre-processing, such as creation of library/ and taxonomy/.

tomkinsc commented 7 years ago

@yesimon: Would you have time to add these functions? They would support later additions to automate db rebuild with new accessions. I like how easy VirMet makes it to rebuild their metagenomics DB with the latest full set of viral genomes. https://virmet.readthedocs.io/en/latest/updating/

yesimon commented 7 years ago

Yup - these are already there in part for the integration tests. Just need to add CLI