The SecretSanta package provides an R interface for the integrative prediction of extracellular proteins that are secreted via classical pathways.
Secretome prediction often involves multiple steps. Typically, it starts with prediction of short signal peptides at the N-terminal end of a protein. Next, it is crucial to ensure the absence of motifs and domains preventing the protein from being secreted despite the presence of the signal peptide. These sequences include transmembrane domains, short ER lumen retention signals,and mitochondria/plastid targeting signals.
Several command line tools and web-interfaces exist to perform predictions of individual motifs and domains (SignalP, TargetP, TMHMM, WoLF PSORT, TOPCONS) however the interface that combines the outputs in a single flexible workflow is lacking.
The SecretSanta package attempts to bridge this gap. It provides wrapper and parser functions around existing command line tools for prediction of signal peptides and protein subcellular localisation. The functions are designed to work together by producing standardized output. This allows the user to pipe results between individual predictors easily to create flexible custom pipelines and also to compare predictions between similar methods.
To speed-up processing of large input fasta files initial steps of the pipeline are automatically run as a massive parallel process when the number of input sequences exceeds a certain limit.
Taken together SecretSanta provides a platform to build automated multi-step secretome prediction pipelines that can be applied to large protein sets to facilitate comparison of secretomes across multiple species or under various conditions.
Below is a summary of main functionality:
manage_paths()
: run tests with the external dependencies to ensure correct installation;signalp()
: predict signal peptides with SignalP 2.0, SignalP 3.0 or SignalP 4.1;tmhmm()
: predict transmembrane domains with TMHMM 2.0;topcons()
: parse predictions of transmemrane domains performed by TOPCONS2;targetp()
: predict subcellular localisation with TargetP 1.1;wolfpsort()
: predict subcellular localisation with WoLF PSORT;check_khdel()
: check C-terminal ER-retention signals;m_slicer()
: generate proteins with alternative translation start sites;ask_uniprot()
: fetch known subcellular location data from UniprotKB based on uniprot ids.Please see the the pre-build vignette for detailed documentation and use-case scenarios.
If you find SecretSanta useful for your work, please cite the following paper:
Anna Gogleva, Hajk-Georg Drost, Sebastian Schornack. SecretSanta: flexible pipelines for functional secretome prediction. Bioinformatics (2018). https://doi.org/10.1093/bioinformatics/bty088
SecretSanta relies on a set of existing command line tools to predict secreted proteins. Please install them and configure according to the listed instructions. Due to limitations imposed by the external dependencies, some of SecretSanta wrapper functions won't work in Windows or Mac, however are fully functional on Linux. Please note, signlap()
wrapper provides access and can work with legacy versions of SignlP (2.0 and 3.0), as well as the most recent version (4.1). If your application does not require multiple SignalP versions the respective version-specific installation instructions could be skipped.
Download the external dependencies:
Place all the tarballs in a dedicated directory and run the installation script inside it.
signalp-2.0
tar -zxvf signalp-2.0.Linux.tar.Z
cd signalp-2.0
signalp-2.0.readme
.mv signalp signalp2
signalp-3.0
tar -zxvf signalp-3.0.Linux.tar.Z
cd signalp-3.0
signalp-3.0.readme
.mv signalp signalp3
signalp-4.1 - the most recent version
tar -zxvf signalp-4.1.Linux.tar.Z
cd signalp-4.1
signalp-4.1.readme
.mv signalp signalp4
taretp-1.1
tar -zxvf targetp-1.1b.Linux.tar.Z
cd targetp-1.1
WoLFPsort
git clone https://github.com/fmaguire/WoLFPSort.git
cd WoLFPSort
./bin/binByPlatform/binary-?
to `./bin/``INSTALL
file.mv runWolfPsortSummary wolfpsort
tmhmm-2.0
tar -zxvf tmhmm-2.0c.Linux.tar.gz
cd tmhmm-2.0c
bin/tmhmm
and bin/tmhmmformat.pl
scripts.README
file.The best option would be to make all the external dependencies are accessible from any location. This requires modification of $PATH
environment variable.
To make the change permanent, edit .profile
:
# Open .profile:
gedit ~/.profile
Add a line with all the path exports. In this example all the dependencies are installed in the my_tool
directory:
export PATH=
"/home/my_tools/signalp-4.1:\
/home/my_tools/signalp-2.0:\
/home/my_tools/signalp-3.0:\
/home/my_tools/targetp-1.1:\
/home/tmhmm-2.0c/bin:\
/home/my_tools/WoLFPSort/bin:\
$PATH"
Reload .profile
:
. ~/.profile
Reboot, to make changes visible to R. If you are using csh or tcsh, edit .login
instead of .profile
and use the setenv
command instead of export
.
To install SecretSanta package:
library("devtools")
install_github("gogleva/SecretSanta")
library("SecretSanta")
Details about individual functions, pipeline assemblies and use case scenarios are documented in the vignette. For a short-form documentation please use:
?SecretSanta
please raise an issue (preferred option) or email gogleva.a.a@gmail.com about bugs and strange things.