EuBIC / EuBIC2020

4 stars 0 forks source link

Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

Open vtsiamis88 opened 5 years ago

vtsiamis88 commented 5 years ago

Abstract

Signal transduction relies on a tightly time-controlled combination of phosphorylation/dephosphorylation events that are difficult to capture and integrate. Their large-scale characterization using bottom-up mass spectrometry necessitates phospho-peptide enrichment prior analysis and presents specific analytical challenges such as increased search space, need for confident modification localization, and extrapolation of proteoform quantitative behavior from a single peptide. Most of these studies provide low protein peptide coverages and thus require statistical sound methods to estimate quantitative changes at proteoform-level. This consists in translating quantitative changes of (phosphorylated) peptides into changes of both the protein and its phosphorylated isoforms, and calculate their relative stoichiometry when modified and unmodified versions of the same peptide are available. To our knowledge there are no suitable data sets that simulate phospho-regulations at the proteoform level, which prevents benchmarking of available computational methods on the basis of real ground truth. In this project, we will build an artificial quantitative phosphoproteomics data set simulating the influence of digestion, sample enrichment, spectra quality, wrong identifications and localizations, as well as technical and biological variance, that can be used for benchmarking of phosphoproteomics (and other PTMomics) data analysis algorithms.

Work plan

Main tasks

These tasks will be discussed on the first day prior to their implementation. Depending on the skills and interest of the participants, we may define working groups for addressing them in the following days.

Preliminary time plan

Tuesday afternoon Presentation of problem : Short presentation of the project. Implementation scheme : Create modular mock-up of the processes that will be used to create the simulated data.

Wednesday Implementation of different modules: Depending on the number of participants, we will form subgroups that will work on implementing modules that simulate:

Thursday

Expected results

At the end of the developer’s meeting, we expect to have a tool for generating a simulated PSM table with quantitative MS data containing modified and non-modified peptides corresponding to artificially regulated phospho-proteins. Depending on the number of participants and our progress, we can also expect to have a basic web interface, and to integrate simple parameters such as which protease(s) to use, digestion efficiency, …

Follow up

After the developer’s meeting, we expect to use the simulated data in ongoing and future projects and hope that they also will be used for benchmarking by bioinformaticians working with PTMomics data.

Technical details

Contact information

Marie Locard-Paulet Novo Nordisk Foundation Center for Protein Research Blegdamsvej 3 2200 København N / Denmark marie.locard-paulet@cpr.ku.dk

Veit Schwämmle Protein Research Group Department for Biochemistry and Molecular Biology University of Southern Denmark Campusvej 55
5230 Odense M / Denmark veits@bmb.sdu.dk

Vasileios Tsiamis Protein Research Group Department for Biochemistry and Molecular Biology University of Southern Denmark Campusvej 55
5230 Odense M / Denmark vasileios@bmb.sdu.dk