The Cluster Data Generation Project

This project provides a framework to reanalyse public Proteomics data, including amount other PRIDE Data or PeptideAtlas . Amount other tools the framework provides methods for prediction of Search Parameters of public submissions, reanalysis pipelines for peptide/protein identification, de novo search or Quality assesment of the final results.

Contact Us:

Please you can contact using github issues: https://github.com/PRIDE-Cluster/cluster-data-generation/issues or to the following email: Yasset Perez-Riverol

Contributors: Marc Vaudel , Kenneth Verheggen

Build the Project

In order to build the project the developer should first clone the project and the corresponding submodules:

git clone --recursive  https://github.com/PRIDE-Cluster/cluster-data-generation

When the porject is download, the developer should make cd into the project folder and execute:

 $ mvn clean
 $ mvn install

All the tools, and corresponding scripts would be store in the resources folder.

Dabatase Handling

A set of tools has been developed to enable the user to perform the following tasks:

Download a Protein database from external Repository (e.g UniProt Proteomes): FastaDownloadTool
Processing a Fasta File including the following tasks: FastaProcessingTool
- Append a Database to the original Database (e.g contaminants database)
- Add Decoys to the result database

Parameters Predictors

Resources folder

The resources folder contains all the tools, scripts and python tools to work with the data data. Most of the scripts are designed for working with LSF jobs.

Troubleshooting

If you have problem to pull the latest version please, force the repository by doing:

  git reset --hard origin/master

and

 git pull

PRIDE-Archive / cluster-data-generation