EuBIC / EuBIC2020

3 stars 0 forks source link

Formation of spectral libraries by representative spectra #9

Open percolator opened 4 years ago

percolator commented 4 years ago

Abstract

Methods to represent multiple spectra Spectral library searching offers a sensitive yet fast method to match spectra from mass spectrometry-based proteomics experiments. The technique was first introduced for searching spectra from data-dependent acquisition (DDA)  but has proven essential for the analysis of data-independent acquisition spectra. As an input, the technique requires spectral libraries. Such entities could be assembled from previously acquired DDA MS2 spectra. One critical step of this assembly process is the integration of the potentially large number of spectra that stem from an individual peptide-species into a single representative spectrum. Here, we will implement and benchmark a couple of such strategies to form representative spectra for the use in spectral libraries.

Work plan

Different strategies have been suggested for forming representative spectra. Frank et al. (JPR 2008) list five strategies, where one selects the representative spectrum to be:

  1. The "best spectrum”: the spectrum that maximizes a certain score, e.g., percent of explained intensity or percent of explained b/y ions.
  2. The “consensus spectrum”: a virtual spectrum constructed by averaging all spectra in the cluster. (Tabb et al. JASMS 2005) 
  3. The “most similar spectrum”: the spectrum that has the highest average similarity to the other cluster members (Tabb et al. Anal Chem 2003).
  4. The “de novo spectrum”: the spectrum that has the highest score when submitted to de novo sequencing.
  5. The random spectrum: a spectrum chosen from the cluster at random.

In this workshop, we will first establish datasets and code to benchmark different methods to form representative spectra. We will implement a couple of the methods mentioned above as well as further improvements from such methods, benchmark the methods and examine their properties. Ideally, we form separate teams implementing different methods.

Technical details

We will mainly use Python 3.7.

Contact information

Lukas Käll KTH - Royal Institute of Technology Stockholm, Sweden lukas.kall@scilifelab.se

ypriverol commented 4 years ago

I'm in!!!

percolator commented 4 years ago

A repository for the hackathon is available through this link https://github.com/statisticalbiotechnology/specpride