BakRep is a searchable large-scale web repository for bacterial genomes enriched with standardized genome characterizations and related metadata. Researchers can query this repository via a flexible search engine thus combining genomic information and metadata. We gratefully acknowledge the initial assembly of all genomes, which was conducted by Blackwell et al. in 2021.
Public databases are brimming with bacterial WGS data posing a genetic treasure for many different applications. However, most analyses focus on particular tiny subgroups that are hard to establish from raw sequencing data or metadata, only. To this end, BakRep (Denglish blend of Bakterien & Repository) provides access to both comprehensive standardized genome characterizations and metadata via a flexible search engine. BakRep comprises assembly QC metrics, robust taxonomic classifications, MLST typing, genome annotations and original metadata for a precious collection of bacterial genomes assembled by Blackwell et al..
The workflow is written in Nextflow and all tools are installed via Conda. Python3 is also required to create the result json files.
Clone the repository:
git clone https://github.com/ag-computational-bio/bakrep
Run the nextflow-script:
nextflow run /nextflow/661k.nf
--samples <metadata>
--setupdir <setupdir>
-with-conda
All optional parameters:
--samples Metadata file which contains sample-id, genus, species and strain.
--setupdir Path to folder which contains all data.
--db Path to database folder which contains GTDBtk, CheckM2 and Bakta database (default: $setupdir/database)
--gtdb Path to GTDBtk database (no need, when set --db)
--checkm2db Path to CheckM2 database (no need, when set --db)
--baktadb Path to Bakta database (no need, when set --db)
--data Path to input assembly files (default: $setuptdir/assemblies)
--results Path to output folder (default: $setupdir/results)
Fenske L, Jelonek L, Goesmann A, Schwengers O. BakRep - a searchable large-scale web repository for bacterial genomes, characterizations and metadata. Microb Genom. 2024 Oct;10(10):001305. doi: 10.1099/mgen.0.001305. PMID: 39475723; PMCID: PMC11524574.
Blackwell GA, Hunt M, Malone KM, Lima L, Horesh G, Alako BTF, Thomson NR, Iqbal Z. Exploring bacterial diversity via a curated and searchable snapshot of >archived DNA sequences. PLoS Biol. 2021 Nov 9;19(11):e3001421. doi: 10.1371/journal.pbio.3001421. PMID: 34752446; PMCID: PMC8577725.