contigtax is a tool that assigns taxonomy to metagenomic contigs by querying
contig nucleotide sequences against a protein database using diamond blastx
and parses hits using rank-specific thresholds. The use of rank-specific
thresholds was first introduced by Luo et al 2014
and used with some modification as explained in Alneberg et al 2018.
Simplest way to install contigtax
is via conda:
conda install -c bioconda contigtax
Alternatively, pull the docker image:
docker pull nbisweden/contigtax
Download fasta file
contigtax download uniref100
Download NCBI taxonomy
contigtax download taxonomy
Reformat fasta file and create taxonmap
contigtax format uniref100/uniref100.fasta.gz uniref100/uniref100.reformat.fasta.gz
Build diamond database
contigtax build uniref100/uniref100.reformat.fasta.gz uniref100/prot.accession2taxid.gz taxonomy/nodes.dmp
Search (here assembled contigs are in file assembly.fa
)
contigtax search -p 4 assembly.fa uniref100/diamond.dmnd assembly.tsv.gz
Assign (here output from the contigtax search
step are in file assembly.tsv.gz
)
contigtax assign -p 4 assembly.tsv.gz assembly.taxonomy.tsv
To run contigtax with docker simply substitute contigtax in the commands above with
docker run --rm -v $(pwd):/work nbisweden/contigtax
, e.g.:
Download fasta file
docker run --rm -v $(pwd):/work nbisweden/contigtax download uniref100
Download NCBI taxonomy
docker run --rm -v $(pwd):/work nbisweden/contigtax download taxonomy
Reformat fasta file and create taxonmap
docker run --rm -v $(pwd):/work nbisweden/contigtax format uniref100/uniref100.fasta.gz uniref100/uniref100.reformat.fasta.gz
Build diamond database
docker run --rm -v $(pwd):/work nbisweden/contigtax build uniref100/uniref100.reformat.fasta.gz uniref100/prot.accession2taxid.gz taxonomy/nodes.dmp
Search (here assembled contigs are in file assembly.fa
)
docker run --rm -v $(pwd):/work nbisweden/contigtax search -p 4 assembly.fa uniref100/diamond.dmnd assembly.tsv.gz
Assign (here output from the contigtax search
step are in file assembly.tsv.gz
)
docker run --rm -v $(pwd):/work nbisweden/contigtax assign -p 4 assembly.tsv.gz assembly.taxonomy.tsv