NBISweden / contigtax

Taxonomic classification of metagenomic contigs
MIT License
6 stars 6 forks source link
metagenomics metatranscriptomics sequence-analysis taxonomy-assignment

contigtax

install with bioconda dependencies Docker update history CI

contigtax is a tool that assigns taxonomy to metagenomic contigs by querying contig nucleotide sequences against a protein database using diamond blastx and parses hits using rank-specific thresholds. The use of rank-specific thresholds was first introduced by Luo et al 2014 and used with some modification as explained in Alneberg et al 2018.

Install

Simplest way to install contigtax is via conda:

conda install -c bioconda contigtax

Alternatively, pull the docker image:

docker pull nbisweden/contigtax

Usage

  1. Download fasta file

    contigtax download uniref100
  2. Download NCBI taxonomy

    contigtax download taxonomy
  3. Reformat fasta file and create taxonmap

    contigtax format uniref100/uniref100.fasta.gz uniref100/uniref100.reformat.fasta.gz
  4. Build diamond database

    contigtax build uniref100/uniref100.reformat.fasta.gz uniref100/prot.accession2taxid.gz taxonomy/nodes.dmp
  5. Search (here assembled contigs are in file assembly.fa)

    contigtax search -p 4 assembly.fa uniref100/diamond.dmnd assembly.tsv.gz
  6. Assign (here output from the contigtax search step are in file assembly.tsv.gz)

    contigtax assign -p 4 assembly.tsv.gz assembly.taxonomy.tsv

Running contigtax with Docker

To run contigtax with docker simply substitute contigtax in the commands above with docker run --rm -v $(pwd):/work nbisweden/contigtax, e.g.:

  1. Download fasta file

    docker run --rm -v $(pwd):/work nbisweden/contigtax download uniref100
  2. Download NCBI taxonomy

    docker run --rm -v $(pwd):/work nbisweden/contigtax download taxonomy
  3. Reformat fasta file and create taxonmap

    docker run --rm -v $(pwd):/work nbisweden/contigtax format uniref100/uniref100.fasta.gz uniref100/uniref100.reformat.fasta.gz
  4. Build diamond database

    docker run --rm -v $(pwd):/work nbisweden/contigtax build uniref100/uniref100.reformat.fasta.gz uniref100/prot.accession2taxid.gz taxonomy/nodes.dmp
  5. Search (here assembled contigs are in file assembly.fa)

    docker run --rm -v $(pwd):/work nbisweden/contigtax search -p 4 assembly.fa uniref100/diamond.dmnd assembly.tsv.gz
  6. Assign (here output from the contigtax search step are in file assembly.tsv.gz)

    docker run --rm -v $(pwd):/work nbisweden/contigtax assign -p 4 assembly.tsv.gz assembly.taxonomy.tsv