HIT-ImmunologyLab / DBSCAN-SWA

30 stars 13 forks source link

DBSCAN-SWA: an integrated tool for rapid prophage detection and annotation

Table of Contents

  1. numpy

  2. Biopython

  3. sklearn

Second, please install the following tools:

  1. Prokka in https://github.com/tseemann/prokka
    git clone https://github.com/tseemann/prokka.git
    # install the dependencies:
    sudo apt-get -y install bioperl libdatetime-perl libxml-simple-perl libdigest-md5-perl
    # install perl package
    sudo bash
    export PERL_MM_USE_DEFAULT=1
    export PERL_EXTUTILS_AUTOINSTALL="--defaultdeps"
    perl -MCPAN -e 'install "XML::Simple"'
    # install the prokka databases
    prokka --setupdb
    # test the installed prokka databases
    prokka --listdb

    warning: Prokka needs blast+ 2.8 or higher, so we provide the version of blast+ in bin directory, the users can install a latest blast+ and add it to the environment or use the blast+ provided by DBSCAN-SWA. Please ensure the usage of blast+ in your environment by eg:

    which makeblastdb

Install

Linux

When the DBSCAN-SWA program is run for the first time, it will download the required databases by default, or you can download the databases manually by setting --download_db' to manual. There are two ways to download the database manually, the first one is from DBSCAN-SWA server and the second one is from Zenodo.

### Download database from DBSCAN-SWA server
wget -c -b http://www.microbiome-bigdata.com/PHISDetector/static/download/DBSCAN-SWA/db.tar.gz

### Access dabase from Zenodo
https://zenodo.org/records/10404224
### Unzip the database file
tar -zxvf path/to/db.tar.gz
### Put the unzipped database files in specified subdirectory
cp path/to/download/db path/to/DBSCAN-SWA
export PATH=$PATH:/path/to/DBSCAN-SWA/software/blast+/bin:$PATH
export PATH=$PATH:/path/to/DBSCAN-SWA/bin
export PATH=$PATH:/path/to/DBSCAN-SWA/software/diamond
export PATH=$PATH:/path/to/prokka/bin
General:
--input <file name>        : Query phage file path: FASTA or GenBank file
--output <folder name>     : Output folder in which results will be stored
--prefix <prefix>     : default: bac:

Phage Clustering:
--evalue <x>               : maximal E-value of searching for homology virus proteins from viral UniProt TrEML database. default:1e-7
--min_protein_num <x>      : optional,the minimal number of proteins forming a phage cluster in DBSCAN, default:6
--protein_number <x>       : optional,the number of expanding proteins when finding prophage att sites, default:10

Phage Annotation:
--add_annotation <options> : optional,default:PGPD,
   1.PGPD: a phage genome and protein database,
   2.phage_path:specified phage genome in FASTA or GenBank format to detect whether the phage infects the query bacteria
   3.none:no phage annotation
--per <x>                  : Minimal % percentage of hit proteins on hit prophage region(default:30)
--idn <x>                  : Minimal % identity of hit region on hit prophage region by making blastn(default:70)
-cov <x>                   : Minimal % coverage of hit region on hit prophage region by making blastn(default:30)

Start DBSCAN-SWA

The python script is also provided for expert users
1.predict prophages of query bacterium with default parameters:

python <path>/dbscan-swa.py --input <bac_path> --output <outdir> --prefix <prefix>
  1. predict prophages of query bacterium and no phage annotation:
    python <path>/dbscan-swa.py --input <bac_path> --output <outdir> --prefix <prefix> --add_annotation none
  2. predict prophages of query bacterium and detect the bacterium-phage interaction between the query bacterium and query phage:
    python <path>/dbscan-swa.py --input <bac_path> --output <outdir> --prefix <prefix> --add_annotation <phage_path>

    Outputs

File Name Description
\<prefix>_DBSCAN-SWA_prophage_summary.txt the tab-delimited table contains predicted prophage informations including prophage location, specific phage-related key words, CDS number, infecting virus species by a majority vote and att sites
\<prefix>_DBSCAN-SWA_prophage.txt this table not only contains the information in _DBSCAN-SWA_prophage_summary.txt but also contains the detailed information of prophage proteins and hit parameters between the prophage protein and hit uniprot virus protein
_DBSCAN-SWA_prophage.fna all predicted prophage Nucleotide sequences in FASTA format
_DBSCAN-SWA_prophage.faa all predicted prophage protein sequences in FASTA format
Phage Annotation if add_annotation!=none, the following files are in "prophage_annotation"
_prophage_annotate_phage.txt the tab-delimited table contains the information of prophage-phage pairs with prophage_homolog_percent, prophage_alignment_identity and prophage_alignment_coverage
_prophage_annotate_phage.txt the table contains the detailed information of bacterium-phage interactions including blastp and blastn results

Visualizations

You can find a directory named "test" in the DBSCAN-SWA package. The following visualzations come from the predicted results of Xylella fastidiosa Temecula1(NC_004556)
(1) the genome viewer to display all predicted prophages and att sites image (2) the detailed information of predicted prophages image (3) If the users set add_annotation as PGPD or the phage file path, the detailed information of bacterium-phage interaction will be illustrated as follows: image

Contributors

This project exists thanks to all the people who contribute.

License

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.