azureycy / dbAPIS

dbAPIS is a database of anti-prokaryotic immune system proteins. The repository contains the codes and scripts used to generate and maintain the database.
4 stars 0 forks source link

dbAPIS website: https://bcb.unl.edu/dbAPIS

Yan, Y., Zheng, J., Zhang, X., & Yin, Y. (2023). dbAPIS: a database of anti-prokaryotic immune system genes. Nucleic Acids Research. https://doi.org/10.1093/nar/gkad932

Tools and databases

Database content processing

Create APIS protein families and add newly curated proteins

Build APIS protein family HMMs

Searching homologous families using HHsearch

Protein function annotation

Protein structure prediction

Searching protein structure homologs using Foldseek

Genomic context visualization using jbrowse

Gene cluster comparison using clinker

Run APIS protein annotation with DIAMOND and HMMscan locally

Run HMMscan on your local server

Download the APIS protein family profile HMMs

wget https://bcb.unl.edu/dbAPIS/downloads/dbAPIS.hmm

prepare a profile database by constructing binary compressed datafiles

hmmpress dbAPIS.hmm

Four files are created: dbAPIS.hmm.h3m, dbAPIS.hmm.h3i, dbAPIS.hmm.h3f, and dbAPIS.hmm.h3p.

Run hmmscan for your amino acid sequences

hmmscan --domtblout hmmscan.out --noali dbAPIS.hmm your_sequence.faa

--domtblout option produces the space-separated domain hits table. There is one line for each domain. --noali option is used to omit the alignment section from output and reduce the output volume. More hmmscan information please see hmmer user guide.

Run DIAMOND on your local server

Download the APIS protein sequences

wget https://bcb.unl.edu/dbAPIS/downloads/anti_defense.pep

Build diamond database with APIS protein sequences

diamond makedb --in anti_defense.pep -d APIS_db

Run diamond for your amino acid sequences

diamond blastp --db APIS_db -q your_sequence.faa -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen -o diamond.out --max-target-seqs 10000

-f 6 option generates tabular-separated format (a BLAST output format using the option -outfmt 6), which composed of the customized fields. --max-target-seqs means maximum number of target sequences to report alignments for. More diamond details please see diamond tutorial.

Parse annotation output

Download the family member mapping table and parser script

wget https://bcb.unl.edu/dbAPIS/downloads/seed_family_mapping.tsv
wget https://bcb.unl.edu/dbAPIS/downloads/parse_annotation_result.sh

Run script to parse annotation output files

bash parse_annotation_result.sh hmmscan.out diamond.out

This will generate parsed output files of hmmscan and diamond respectively