gogleva / SecretSanta

flexible pipelines for scalable secretome prediction, R package
12 stars 4 forks source link
package pipeline proteomics r reproducible-research

Project Status: Active – The project has reached a stable, usable state and is being actively developed. minimal R version Linux Build Status

Please note, SecretSanta is currently undergoing updates, meanwhile please use it with R 3.4 and the previous releases of Bioconductor (3.5-3.6).

1. Background

The SecretSanta package provides an R interface for the integrative prediction of extracellular proteins that are secreted via classical pathways.

Secretome prediction often involves multiple steps. Typically, it starts with prediction of short signal peptides at the N-terminal end of a protein. Next, it is crucial to ensure the absence of motifs and domains preventing the protein from being secreted despite the presence of the signal peptide. These sequences include transmembrane domains, short ER lumen retention signals,and mitochondria/plastid targeting signals.

Several command line tools and web-interfaces exist to perform predictions of individual motifs and domains (SignalP, TargetP, TMHMM, WoLF PSORT, TOPCONS) however the interface that combines the outputs in a single flexible workflow is lacking.

The SecretSanta package attempts to bridge this gap. It provides wrapper and parser functions around existing command line tools for prediction of signal peptides and protein subcellular localisation. The functions are designed to work together by producing standardized output. This allows the user to pipe results between individual predictors easily to create flexible custom pipelines and also to compare predictions between similar methods.

To speed-up processing of large input fasta files initial steps of the pipeline are automatically run as a massive parallel process when the number of input sequences exceeds a certain limit.

Taken together SecretSanta provides a platform to build automated multi-step secretome prediction pipelines that can be applied to large protein sets to facilitate comparison of secretomes across multiple species or under various conditions.

Below is a summary of main functionality:

Please see the the pre-build vignette for detailed documentation and use-case scenarios.

Citation:

If you find SecretSanta useful for your work, please cite the following paper:

Anna Gogleva, Hajk-Georg Drost, Sebastian Schornack. SecretSanta: flexible pipelines for functional secretome prediction. Bioinformatics (2018). https://doi.org/10.1093/bioinformatics/bty088

2. External dependencies

SecretSanta relies on a set of existing command line tools to predict secreted proteins. Please install them and configure according to the listed instructions. Due to limitations imposed by the external dependencies, some of SecretSanta wrapper functions won't work in Windows or Mac, however are fully functional on Linux. Please note, signlap() wrapper provides access and can work with legacy versions of SignlP (2.0 and 3.0), as well as the most recent version (4.1). If your application does not require multiple SignalP versions the respective version-specific installation instructions could be skipped.

2.1 Automatic installation of external dependencies

Download the external dependencies:

Place all the tarballs in a dedicated directory and run the installation script inside it.

2.2 Manual installation of external dependencies

Tools for prediction of signal peptides and cleavage sites:
Tools for prediction of transmembrane domains
Organise access to the external dependencies

The best option would be to make all the external dependencies are accessible from any location. This requires modification of $PATH environment variable.

To make the change permanent, edit .profile:

# Open .profile:
gedit ~/.profile

Add a line with all the path exports. In this example all the dependencies are installed in the my_tool directory:

export PATH=
"/home/my_tools/signalp-4.1:\
/home/my_tools/signalp-2.0:\
/home/my_tools/signalp-3.0:\
/home/my_tools/targetp-1.1:\
/home/tmhmm-2.0c/bin:\
/home/my_tools/WoLFPSort/bin:\
$PATH"

Reload .profile:

. ~/.profile

Reboot, to make changes visible to R. If you are using csh or tcsh, edit .login instead of .profile and use the setenv command instead of export.

3. Installation

To install SecretSanta package:

library("devtools")
install_github("gogleva/SecretSanta")
library("SecretSanta")

Details about individual functions, pipeline assemblies and use case scenarios are documented in the vignette. For a short-form documentation please use:

?SecretSanta

Reporting bugs

please raise an issue (preferred option) or email gogleva.a.a@gmail.com about bugs and strange things.