Open timosachsenberg opened 1 year ago
Map content in HPLC/MS In the following we will use the term LC-MS map as a collection of 3D points, described by m/z, RT (retention time) and intensity. The data could be the either raw, unprocessed data points, or a processed feature, respectively consensus feature (a set of features corresponding to the same compound). (For example the following illustration shows symbolically the reduction from raw data points to features in a map)
The quantitative information in a LC-MS map can be used in numerous applications. The spectrum ranges from additive measurements in analytical chemistry, over analysis of time series in expression experiments, to applications in clinical diagnostics, in which we want to find statistical significant markers for detecting certain disease states. All these applications have in common that we need to relate the same peptides in different measurements to each other. This is usually done under the assumption that the measured m/z and RT of a peptide stay roughly constant.
As with each laboratory experiment, this only holds true to a certain extent. In particular, the RT often shows large shifts and possibly distortions when different runs are compared, but also the m/z dimension might show (relatively smaller) distortions. This fact makes the assignment of similar peptides difficult since the relative shift of two maps to each other is not known in advance. But it is crucial to correct for those warps. Otherwise, it is hard or even impossible to find for a peptide in the first map and the corresponding partner in the second map
Goal of map alignment The goal if map alignment, given k maps M1,…,Mk is it to compute k transformations T1,…,Tk , with such that the transformed coordinates of corresponding features are as near as possible. a,b,c
Many applications are only possible if we achieve this goal, since we only then know which features belong to each other. For example when running replicate measurements of a sample (to assess the variance in quantitation) we need to group corresponding features together.
Often an affine alignment is sufficient. However non linear distortions are possible. In that case one can compute a more accurate local alignment using LOESS regression. LOESS regression (often also called LOWESS) is a locally weighted polynomial regression, based on a pre-defined window size. Points within this window contribute to the local regression.
Stable isotope labeling of amino acids in cell culture (SILAC) Here we will describe the commonly used method of stable isotope labeling of amino acids in cell culture (SILAC), based on its application for monitoring changes in protein phosphorylation (used in "Phosphoproteomics: new insights into cellular signaling Marc Mumby and Deirdre Brekken Genome Biology 2005, 6:230"). We will therby go through the steps in the Figure below. Subfigure (a) shows a schematic outline of the labeling step. Two cultures of the same cells are grown on a plate with normal medium with only light versions of the essential amino acids (including 12C6-arginine; blue plate) or in medium containing arginine labeled at all six carbons with their heavy isotopes (13C-arginine; red plate). The cells in the labeled medium are then treated, in the sense of being stimulated for activation and therefore phosphorylation. Cells are then dried, harvested and identical amounts of lysated protein are mixed and pooled to one sample. This sample is subjected to a HPLC-MS/MS analysis after tryptic digestion and optional enrichment. The effects of the labeling on the resulting spectra can be seen in subfigures (b) and (c). In (c) we can see, the relative intensities of the isotope envelopes of two differently labeled (but otherwise identical) peptides. The left MS spectrum shows an unphosphorylated peptide (determined by MS2 database search) that could be found in the same concentration in both conditions. Its right counterpart however shows a relative increase of abundancy between control and treatment and unsurprisingly this peptide was identified to be phosphorylated.
Label-free quantification Label-free quantification is a method that aims to determine the relative amount of proteins in two or more biological samples. It may be based on precursor signal intensity or on spectral counting. The first method is useful when applied to high precision mass spectra. In contrast, spectral counting simply counts the number of spectra identified for a given peptide in different biological samples and then integrates the results for all measured peptides of the protein(s) that are quantified. The computational framework includes detecting peptides, matching the corresponding peptides across multiple maps, selecting discriminatory peptides.
Analysis strategy
Feature finding Let us focus on the quantification through the ion current in MS spectra. In this case, MS intensity follows the chromatographic concentration. Some properties of MS map are given below: Up to millions of points per spectrum Tens of thousands of spectra per LC run Huge 2D datasets of up to hundreds of GB per sample Raw data: unmodified detector signal Centroided data: peaks called on the MS level We then implement feature finding to reduce the data complexity while keeping the features (peaks).
Isotope patterns The monoisotopic peak is the mass peak corresponding to the monoisotopic mass of an analyte. It plays a central role in many mass spectrometry processing tasks.
For most elements exist several naturally occurring isotopes so we usually don’t observe isotopically pure molecules. Instead we observe each possible version with a certain probability determined by the relative isotopic abundances. Molecule species that differ in the number of neutrons are called isotopologues. Note that this implies that different isotopologues have different masses. For example, for a single carbon atom there are two variants. Either, it is a carbon-12 or a carbon-13 isotope. Because the relative abundance for carbon-12 is 98.93% and for carbon-13 1.07% we observe carbon-12 with p=0.9893 and carbon-13 with p=0.017. C has two isotopologues. Measuring a single carbon in a mass spectrometer therefore gives rise to two peaks. One is the monoisotopic peak (carbon-12) and one is the carbon-13 peak (see below).
For C_n we already have n different possible places where the carbon-12 is replaced by a carbon-13 resulting in 2n configurations. In the mass spectrum, we observe n + 1 peaks corresponding to all n + 1 isotopologues.
Example: Isotope pattern of C1000 . Note the bell shape (=isotopic envelope) of the pattern.
Biomolecules contain more than one element. Peptides e.g. contain C, H, N, O, P and S giving rise to more complex isotope patterns. Depending on e.g. the instrument resolution a modern mass spectrometer can resolve mass peaks stemming from different isotopes of different elements. One says, it can resolve the isotopic fine structure of an analyte. This is often required to distinguish small molecules but e.g. not that important for the analysis of peptides.
Example: The molecule CO has 6 different configurations. Two (for carbon-12 and carbon-13) times three (for oxygen-16, oxygen-17, oxygen-18) different peaks of the isotopic fine structure can be observed in theory. This means that our mass spectrometer must resolve mass-to-charge peaks of configurations with the same number of protons and neutrons. E.g. consider the two configurations 13C16O and 12C17O . Both have in total 14 protons and 15 neutrons but their mass is slightly different: 28.9983 u and 28.9991 u.
Example 2 (see Figure below): The big spectrum on the bottom shows the isotopic fine structure of the isotopic peak of insulin that is roughly 5 Da above the monoisotopic peak (*, upper left). A mass resolving power, m/FWHM > 2,300,000 is required to resolve the two closely spaced species with five 13Cs vs. three 13Cs and one 18O, differing by only 2.5 mDa.
Averagine Since the isotope pattern changes with the composition of the peptide, it is unknown which pattern should be fitted. If we want to determine the monoisotopic mass from average mass, it's better to use a distribution of amino acids instead of assuming all amino acids have the same probability of occurrence. If we assume an average composition of an amino acid, then we can estimate the elemental composition of the peptide. Such an average amino acid, also called ‘averagine’, can be derived statistically from protein databases(Senko et. al, 1995): C4.94H7.76N1.36O1.48S0.04 , with an average mass of 111.1254 Da.
[1] Senko, Michael W., Steven C. Beu, and Fred W. McLaffertycor. "Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions." Journal of the American Society for Mass Spectrometry 6.4 (1995): 229-233. Based on averagine compositions one can compute the isotope patterns for any given mass. To obtain a model molecular formula, the number of averagine units in a molecule is determined from the average molecular mass, and then this number is multiplied by the number of atoms of each element in an averagine residue. Because calculation of the theoretical isotopic profile requires integral numbers of atoms, the values obtained for C, N, 0, and S are rounded to the nearest integer and the final average molecular mass is corrected by adjustment of the number of Hs. Rounding errors induced by the addition or subtraction of half a C, N, 0, or S and numerous Hs do not shift the isotopic distribution a significant amount.
For example, the 20-kDa model compound would be composed of 179.98 averagine units and should therefore contain 7.5 sulfur atoms. The abundances for the isotopic peaks obtained when the number of sulfurs is rounded down to 7 (while adding 16 hydrogens) differ by less than 1% relative to the isotopic abundances