Yan, Y., Zheng, J., Zhang, X., & Yin, Y. (2023). dbAPIS: a database of anti-prokaryotic immune system genes. Nucleic Acids Research. https://doi.org/10.1093/nar/gkad932
Download the APIS protein family profile HMMs
wget https://bcb.unl.edu/dbAPIS/downloads/dbAPIS.hmm
prepare a profile database by constructing binary compressed datafiles
hmmpress dbAPIS.hmm
Four files are created: dbAPIS.hmm.h3m, dbAPIS.hmm.h3i, dbAPIS.hmm.h3f, and dbAPIS.hmm.h3p.
Run hmmscan for your amino acid sequences
hmmscan --domtblout hmmscan.out --noali dbAPIS.hmm your_sequence.faa
--domtblout
option produces the space-separated domain hits table. There is one line for each domain. --noali
option is used to omit the alignment section from output and reduce the output volume. More hmmscan information please see hmmer user guide.
Download the APIS protein sequences
wget https://bcb.unl.edu/dbAPIS/downloads/anti_defense.pep
Build diamond database with APIS protein sequences
diamond makedb --in anti_defense.pep -d APIS_db
Run diamond for your amino acid sequences
diamond blastp --db APIS_db -q your_sequence.faa -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen -o diamond.out --max-target-seqs 10000
-f 6
option generates tabular-separated format (a BLAST output format using the option -outfmt 6), which composed of the customized fields. --max-target-seqs
means maximum number of target sequences to report alignments for. More diamond details please see diamond tutorial.
Download the family member mapping table and parser script
wget https://bcb.unl.edu/dbAPIS/downloads/seed_family_mapping.tsv
wget https://bcb.unl.edu/dbAPIS/downloads/parse_annotation_result.sh
Run script to parse annotation output files
bash parse_annotation_result.sh hmmscan.out diamond.out
This will generate parsed output files of hmmscan and diamond respectively
hmmscan.out.parsed.tsv
contains 13 columns:
diamond.out.parsed.tsv
contains 12 columns: