BolognaBiocomp / deepsig

DeepSig - Predictor of signal peptides in proteins based on deep learning
https://deepsig.biocomp.unibo.it
GNU General Public License v3.0
25 stars 4 forks source link

DeepSig - Predictor of signal peptides in proteins based on deep learning

Publication

Savojardo C., Martelli P.L., Fariselli P., Casadio R. DeepSig: deep learning improves signal peptide detection in proteins Bioinformatics (2017) 34(10): 1690-1696.

Installation using pip

First, install deepsig-biocomp package using pip:

pip install deepsig-biocomp

Then, clone the deepsig repo from GitHub and export the DEEPSIG_ROOT directory:


git clone git@github.com:BolognaBiocomp/deepsig.git
cd deepsig
export DEEPSIG_ROOT=$(pwd)

Installation using Bioconda

Install deepsig using conda:

conda install -c bioconda deepsig

Usage

$ deepsig -h

usage: deepsig.py [-h] -f FASTA -o OUTF -k {euk,gramp,gramn} [-a CPU]

DeepSig: Predictor of signal peptides in proteins

optional arguments:
  -h, --help            show this help message and exit
  -f FASTA, --fasta FASTA
                        The input multi-FASTA file name
  -o OUTF, --outf OUTF  The output tabular file
  -k {euk,gramp,gramn}, --organism {euk,gramp,gramn}
                        The organism the sequences belongs to

The program accepts three mandatory arguments:

The DeepSig Docker image

Image availbale on DockerHub https://hub.docker.com/r/bolognabiocomp/deepsig

Usage of the image

The first step to run DeepSig Docker container is the pull the container image. To do so, run:

$ docker pull bolognabiocomp/deepsig

Now the DeepSig Docker image is installed in your local Docker environment and ready to be used. To show DeepSig help page run:

$ docker run bolognabiocomp/deepsig -h

Using TensorFlow backend.
usage: deepsig.py [-h] -f FASTA -o OUTF -k {euk,gramp,gramn} [-a CPU]

DeepSig: Predictor of signal peptides in proteins

optional arguments:
  -h, --help            show this help message and exit
  -f FASTA, --fasta FASTA
                        The input multi-FASTA file name
  -o OUTF, --outf OUTF  The output tabular file
  -k {euk,gramp,gramn}, --organism {euk,gramp,gramn}
                        The organism the sequences belongs to

The program accepts three mandatory arguments:

Let's now try a concrete example. First of all, let's downlaod an example sequence from UniProtKB, e.g. the Transthyretin-like protein 52 form Caenorhabditis elegans with accession G5ED35:

$ wget https://www.uniprot.org/uniprot/G5ED35.fasta

Now, we are ready to predict the signal peptide of our input protein. Run:

$ docker run -v $(pwd):/data/ bolognabiocomp/deepsig -f G5ED35.fasta -o G5ED35.out -k euk

In the example above, we are mapping the current program working directory ($(pwd)) to the /data/ folder inside the container. This will allow the container to see the external FASTA file G5ED35.fasta. The file G5ED35.out now contains the DeepSig prediction, in GFF3 format:

$ cat G5ED35.out

sp|G5ED35|TTR52_CAEEL   DeepSig Signal peptide  1   20  0.98    .   .   evidence=ECO:0000256
sp|G5ED35|TTR52_CAEEL   DeepSig Chain   21  135 .   .   .   evidence=ECO:0000256

Columns are as follows:

Please, reports bugs to: castrense.savojardo2@unibo.it