Closed jorainer closed 6 years ago
@SiggiSmara's centWave parameters were:
peakwidth = c(1.5, 8)
ppm = 50
snthresh = 5
mzdiff = +0.001
prefilter = c(4, 300)
noise = 100
To specifically define the ppm
and peakwidth
settings we will evaluate the data of known compounds in the QC samples.
Some things to think about in terms of the centWave parameters, all are based on exploration of the data and impressions (no statistics behind) but also looking at known compounds:
Update on the mzdiff
parameter. This is used in the peak post-processing, i.e. after all peaks have been identified in a sample, the function iterates through all peaks and for peaks defined to be overlapping in rt and mz dimension (depending on this parameter) keeps only the one with the highest signal.
Hmm.. ok and what is the definition of overlap in rt then? Does that mean that it is better to allow mz overlap in terms of being more inclusive?
honestly - no idea. I need to dig into the undocumented C-code for that...
holy guacamolee, that deep? github has no emoticon for the appropriate emotion expression .. maybe :astonished: or :scream_cat: or :see_no_evil: :hear_no_evil: :speak_no_evil:
Now, step by step:
ppm
parameter. For each known compound in all QC samples:
diff(range(mzs)) * 1e6 / mz
(mzs
the m/z values from the MS data slice containing the peak, mz
being the theoretical m/z of the adduct. This is done for each file.The table below lists all considered compounds and the mean and 75% quantile ppm (columns "mean_ppm"
and "75_quant_ppm"
.
name | mz | rt_min | rt_max | mean_ppm | 75_quant_ppm | |
---|---|---|---|---|---|---|
kc_1 | Glycine | 76.04 | 167 | 173 | 32.22 | 46.35 |
kc_3 | Alanine | 90.05 | 161 | 167 | 39.22 | 61.73 |
kc_7 | Dimethylglycine | 104.1 | 161 | 167 | 63.36 | 93.35 |
kc_10 | Serine | 106 | 178 | 183 | 23.68 | 33.5 |
kc_13 | Proline | 116.1 | 155 | 167 | 61.65 | 80.33 |
kc_18 | Betaine | 118.1 | 155 | 162 | 17.87 | 19.8 |
kc_19 | Valine | 118.1 | 155 | 163 | 60.82 | 76.54 |
kc_21 | Threonine | 120.1 | 168 | 175 | 71.95 | 102.1 |
kc_22 | Phenylethylamine | 122.1 | 31 | 35 | 18.58 | 23.88 |
kc_23 | Niacinamide | 123.1 | 40 | 53 | 81.12 | 97.99 |
kc_24 | Taurine | 126 | 166 | 171 | 68.17 | 87.3 |
kc_26 | Pipecolic acid | 130.1 | 155 | 160 | 58.67 | 77.11 |
kc_28 | Hydroxyproline | 132.1 | 169 | 175 | 91.93 | 112 |
kc_30 | Creatine | 132.1 | 156 | 165 | 63.78 | 83.64 |
kc_31 | Leucine | 132.1 | 142 | 153 | 30.41 | 40.26 |
kc_32 | Isoleucine | 132.1 | 142 | 153 | 30.41 | 40.26 |
kc_34 | Asparagine | 133.1 | 181 | 186 | 33.26 | 48.71 |
kc_35 | Ornithine | 133.1 | 194 | 202 | 49.82 | 62.77 |
kc_36 | L-Aspartic Acid | 134 | 184 | 192 | 47.46 | 68.79 |
kc_43 | Glutamine | 147.1 | 176 | 182 | 33.1 | 49.39 |
kc_44 | Lysine | 147.1 | 191 | 198 | 20.8 | 28.36 |
kc_46 | L-Glutamic Acid | 148.1 | 170 | 176 | 44.22 | 60.58 |
kc_52 | Histidine | 156.1 | 191 | 198 | 34.33 | 46.44 |
kc_54 | Succinylacetone | 159.1 | 26 | 29 | 35.74 | 52.18 |
kc_56 | alpha-Aminoadipic acid | 162.1 | 165 | 169 | 27.09 | 37.18 |
kc_61 | Phenylalanine | 166.1 | 153 | 157 | 5.73 | 6.745 |
kc_67 | 1-Methylhistidine | 170.1 | 185 | 197 | 49.96 | 65.7 |
kc_68 | 3-Methylhistidine | 170.1 | 185 | 197 | 49.96 | 65.7 |
kc_71 | N-Acetylornithine | 175.1 | 148 | 154 | 62.72 | 94.88 |
kc_72 | Arginine | 175.1 | 189 | 197 | 71.84 | 92.97 |
kc_75 | Citrulline | 176.1 | 180 | 185 | 48.55 | 64.9 |
kc_79 | Glucosamine | 180.1 | 163 | 170 | 62.12 | 80.82 |
kc_81 | Fructose | 203.1 | 158 | 168 | 16.44 | 21.09 |
kc_82 | Mannose | 203.1 | 158 | 168 | 16.44 | 21.09 |
kc_85 | Tyrosine | 182.1 | 159 | 166 | 50.34 | 65.07 |
kc_86 | 1-Methyluric acid | 183.1 | 145 | 149 | 57.58 | 91.89 |
kc_88 | galactitol | 183.1 | 159 | 165 | 84.79 | 109.5 |
kc_90 | Phosphorylcholine | 184.1 | 190 | 200 | 67.14 | 80.92 |
kc_95 | Caffeine | 195.1 | 30 | 34 | 55.79 | 71.69 |
kc_100 | ADMA | 203.2 | 178 | 186 | 56.69 | 73.23 |
kc_103 | Tryptophan | 205.1 | 145 | 154 | 47.84 | 65 |
kc_106 | Phosphocreatine | 212 | 183 | 189 | 31.01 | 44.49 |
kc_122 | Sphingosine | 300.3 | 28 | 31 | 7.841 | 10.66 |
kc_123 | Acetylneuraminic Acid | 332.1 | 165 | 170 | 63.45 | 78.58 |
kc_125 | Sucrose | 365.1 | 183 | 193 | 90.98 | 109 |
The ppm
values are surprisingly high. The mean of them is 63. I will thus use a ppm = 70
in the initial analysis and evaluate then the identified peaks for the known compounds. In addition I will compare it with a lower ppm
setting.
Ok now I am confused. Are you taking all m/z values in the window of extraction? and are you using the theoretical m/z value as a divisor?
Maybe I am misunderstanding what you are doing, but in my mind you would need to do some sort of determination on a run by run basis what m/z range to use if you want to generate statistics on the m/z values found in your search range. Otherwise you might be including all sorts of noise in the data.
What I'm doing:
1) in each file, each compound: get all m/z values within the rt-window and theoretical m/z +/- 0.01.
2) the ppm is then the diff(range(mz)) * 1e6 / mz_comp
, mz_comp
being the theoretical m/z of the compound. This can (because of the +/- 0.01) include some noisy signal - but I hope that this is removed in the peak refinement step.
so yes, I'm using the theoretical m/z as divisor - hope that's OK as I always get confused with this ppm.
I just have a reall problem with 70ppm being the spread of the m/z values in a peak given that what I have seen so far is closer to 10ppm or lower.
Maybe a picture can help us.
In order for you not to inflate the calculated ppm spread for each run you would have to limit the range calculation to the box that is drawn by the blue and red small dotted lines.
Any value outside of that box is only going to increase the ppm error artificially, in particular since your m/z window is rather big. And since the rt window you use is necessarily larger than any individual peak in any individual run you will add quite a lot of noise in your data.
Actually I have some R script I could run on the VAMS data to see if what I'm writing here is right... let me see if I can hack some results...
Sorry again, deleted my previous post. Found an error in my script that makes all of the results I had seen invalid. I'm done for the day, let's talk tomorrow.
Be aware that the ppm
does not define the final m/z width of the peak. This is just for the initial identification of the ROIs. Peaks are then identified in those using centWave and these are further refined. I'll dig here into the (again undocumented C-code) to understand what exactly ppm
does.
I'll then compare the results from different ppm
settings - but that will require some time, since I have to fix something in xcms
first...
That's how the ppm
parameter is used:
So, a ROI is extended if there is an m/z value that is closer to the ROIs mean m/z than defined by the ppm
parameter. I'll update the documentation in xcms
to make that clear.
Yes that is true,but the correct way of thinking about this parameter in my opinion is that it should somehow reflect the reality of the data, i.e. be based on the real ROI data. As a mass spec guy to see that the ppm width of the known compounds is 70 ppm is startling. Also since the reality is probably closer to 10-20ppm. I already didn't like the fact that I used 50 ppm in my settings, if we now go to 70 ppm that is an even bigger window and the possibility of artificial overlap increases.
I'm generating some data using the known compounds that I hope is reflecting this in a better way. So far I'm happy with cafeine (my test compound). It gives me a mean ppm width (based on standard deviation) of close to 15.
You have to consider that the rt region I manually define needs to accommodate the full chromatographic peak across all samples - and most are not well aligned and I'm thus accepting a lot of noise in there (going manually through chromatograms of 100 compounds on 200 samples to define a more realistic rt range is simply not something I would like to do...). In the end I have the hope that the refinement step reduces the m/z width - something I'm going to check for different settings of ppm
anyways.
In the end there we'll have to agree on a tradeoff anyway - using a too low ppm
will lead the ROIs to lack some data points and the chromatographic detection will have in such cases trouble identifying a well-shaped peak - in worst case splitting the peak in half or even worse just reporting a part of it.
Update: now I'm reporting the maximal difference between m/z values from consecutive scans, which is a better proxy for the ROI definition:
name | mz | rt_min | rt_max | mean_ppm | 75_quant_ppm | |
---|---|---|---|---|---|---|
kc_1 | Glycine | 76.04 | 167 | 173 | 41.78 | 62.71 |
kc_3 | Alanine | 90.05 | 161 | 167 | 46.37 | 76.68 |
kc_7 | Dimethylglycine | 104.1 | 161 | 167 | 63.37 | 84.81 |
kc_10 | Serine | 106 | 178 | 183 | 22.36 | 30.39 |
kc_13 | Proline | 116.1 | 155 | 167 | 53.81 | 70.53 |
kc_18 | Betaine | 118.1 | 155 | 162 | 13.16 | 8.813 |
kc_19 | Valine | 118.1 | 155 | 163 | 39.69 | 58.54 |
kc_21 | Threonine | 120.1 | 168 | 175 | 55.39 | 82.75 |
kc_22 | Phenylethylamine | 122.1 | 31 | 35 | 11.54 | 10.21 |
kc_23 | Niacinamide | 123.1 | 40 | 53 | 70.01 | 96.11 |
kc_24 | Taurine | 126 | 166 | 171 | 69.54 | 103.7 |
kc_26 | Pipecolic acid | 130.1 | 155 | 160 | 37.88 | 50.46 |
kc_28 | Hydroxyproline | 132.1 | 169 | 175 | 80.88 | 115.4 |
kc_30 | Creatine | 132.1 | 156 | 165 | 56.09 | 69.46 |
kc_31 | Leucine | 132.1 | 142 | 153 | 19.73 | 37.66 |
kc_32 | Isoleucine | 132.1 | 142 | 153 | 19.73 | 37.66 |
kc_34 | Asparagine | 133.1 | 181 | 186 | 19.88 | 31.72 |
kc_35 | Ornithine | 133.1 | 194 | 202 | 38.47 | 59.49 |
kc_36 | L-Aspartic Acid | 134 | 184 | 192 | 28.5 | 50.49 |
kc_43 | Glutamine | 147.1 | 176 | 182 | 21.16 | 30.84 |
kc_44 | Lysine | 147.1 | 191 | 198 | 13.32 | 18.74 |
kc_46 | L-Glutamic Acid | 148.1 | 170 | 176 | 24.42 | 28.73 |
kc_52 | Histidine | 156.1 | 191 | 198 | 21.79 | 34.13 |
kc_54 | Succinylacetone | 159.1 | 26 | 29 | 17.82 | 25.99 |
kc_56 | alpha-Aminoadipic acid | 162.1 | 165 | 169 | 17.04 | 20.65 |
kc_61 | Phenylalanine | 166.1 | 153 | 157 | 2.352 | 3.007 |
kc_67 | 1-Methylhistidine | 170.1 | 185 | 197 | 32.7 | 46.48 |
kc_68 | 3-Methylhistidine | 170.1 | 185 | 197 | 32.7 | 46.48 |
kc_71 | N-Acetylornithine | 175.1 | 148 | 154 | 41.16 | 64.1 |
kc_72 | Arginine | 175.1 | 189 | 197 | 42.6 | 70.89 |
kc_75 | Citrulline | 176.1 | 180 | 185 | 26.72 | 36.22 |
kc_79 | Glucosamine | 180.1 | 163 | 170 | 32.91 | 45.05 |
kc_81 | Fructose | 203.1 | 158 | 168 | 4.472 | 4.833 |
kc_82 | Mannose | 203.1 | 158 | 168 | 4.472 | 4.833 |
kc_85 | Tyrosine | 182.1 | 159 | 166 | 29.29 | 45.09 |
kc_86 | 1-Methyluric acid | 183.1 | 145 | 149 | 22.51 | 47.2 |
kc_88 | galactitol | 183.1 | 159 | 165 | 51.68 | 77.89 |
kc_90 | Phosphorylcholine | 184.1 | 190 | 200 | 36.88 | 45.72 |
kc_95 | Caffeine | 195.1 | 30 | 34 | 18.3 | 20.21 |
kc_100 | ADMA | 203.2 | 178 | 186 | 27.07 | 36.13 |
kc_103 | Tryptophan | 205.1 | 145 | 154 | 20.99 | 25.21 |
kc_106 | Phosphocreatine | 212 | 183 | 189 | 9.342 | 12.14 |
kc_122 | Sphingosine | 300.3 | 28 | 31 | 2.503 | 3.395 |
kc_123 | Acetylneuraminic Acid | 332.1 | 165 | 170 | 19.95 | 23.31 |
kc_125 | Sucrose | 365.1 | 183 | 193 | 25.14 | 30.9 |
The average of the 75% quantile is now 44 - so a ppm = 50
was actually not bad :)
That is definitely better! I understand your approach, I'm just not entirely convinced on having a global retention time window in this type of data generation :smile: It will give you results relatively simply but the quality of those results is a bit questionable in my mind. Hence I'm hacking a way to do peak detection on the chromatograms, let's see if I am successful in that.
Now, the results from the peak detection with ppm = 70
are not too bad:
name | rt_width | mz_width | mz_width_ppm | mz_ppm | multi_pk_count | all | venous | RBC | capillary | plasma | |
---|---|---|---|---|---|---|---|---|---|---|---|
kc_1 | Glycine | 3.753 | 0.0002509 | 3.299 | 9.94 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_3 | Alanine | 3.652 | 0.0002983 | 3.312 | 5.399 | 0 | 100 | 100 | 79.55 | 97.73 | 100 |
kc_7 | Dimethylglycine | 2.869 | 0.001147 | 11.02 | 2.362 | 0 | 95.83 | 93.18 | 18.18 | 20.45 | 56.82 |
kc_10 | Serine | 3.256 | 0.0005667 | 5.343 | 1.997 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_13 | Proline | 4.1 | 0.0003432 | 2.957 | 2.042 | 98 | 100 | 100 | 97.73 | 100 | 100 |
kc_18 | Betaine | 4.366 | 0.0003454 | 2.925 | 1.55 | 52 | 100 | 100 | 61.36 | 38.64 | 100 |
kc_19 | Valine | 4.434 | 0.0003547 | 3.004 | 1.546 | 54 | 100 | 100 | 68.18 | 38.64 | 100 |
kc_21 | Threonine | 3.87 | 0.0002339 | 1.948 | 1.397 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_22 | Phenylethylamine | 4.22 | 0.0006302 | 5.161 | 1.772 | 0 | 33.33 | 15.91 | 22.73 | 29.55 | 22.73 |
kc_23 | Niacinamide | 7.145 | 0.001107 | 8.998 | 4.297 | 22 | 91.67 | 97.73 | 97.73 | 81.82 | 0 |
kc_24 | Taurine | 3.782 | 0.0004699 | 3.729 | 2.66 | 3 | 100 | 100 | 100 | 100 | 100 |
kc_26 | Pipecolic acid | 3.648 | 0.001155 | 8.881 | 1.881 | 3 | 87.5 | 34.09 | 47.73 | 45.45 | 34.09 |
kc_28 | Hydroxyproline | 3.472 | 0.001956 | 14.81 | 10.92 | 38 | 100 | 97.73 | 95.45 | 100 | 100 |
kc_30 | Creatine | 4.193 | 0.0005438 | 4.118 | 4.144 | 6 | 100 | 86.36 | 29.55 | 40.91 | 25 |
kc_31 | Leucine | 4.218 | 0.0003974 | 3.008 | 1.632 | 156 | 100 | 100 | 100 | 100 | 100 |
kc_32 | Isoleucine | 4.218 | 0.0003974 | 3.008 | 1.632 | 156 | 100 | 100 | 100 | 100 | 100 |
kc_34 | Asparagine | 3.49 | 0.0003877 | 2.914 | 1.403 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_35 | Ornithine | 3.91 | 0.0002308 | 1.734 | 1.494 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_36 | L-Aspartic Acid | 3.892 | 0.0005754 | 4.293 | 1.686 | 9 | 100 | 100 | 81.82 | 9.091 | 100 |
kc_43 | Glutamine | 4.034 | 0.0002848 | 1.937 | 1.687 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_44 | Lysine | 4.233 | 0.0002997 | 2.037 | 1.221 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_46 | L-Glutamic Acid | 4.102 | 0.0003387 | 2.288 | 1.326 | 1 | 100 | 100 | 100 | 100 | 100 |
kc_52 | Histidine | 4.376 | 0.0002603 | 1.668 | 1.278 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_54 | Succinylacetone | 1.978 | 0.003825 | 24.05 | 3.268 | 0 | 58.33 | 27.27 | 43.18 | 40.91 | 34.09 |
kc_56 | alpha-Aminoadipic acid | 3.332 | 0.0008565 | 5.284 | 1.814 | 0 | 100 | 100 | 93.18 | 100 | 100 |
kc_61 | Phenylalanine | 4.397 | 0.0005309 | 3.197 | 13.78 | 0 | 100 | 100 | 43.18 | 68.18 | 95.45 |
kc_67 | 1-Methylhistidine | 4.026 | 0.0006932 | 4.076 | 1.435 | 159 | 100 | 100 | 100 | 100 | 100 |
kc_68 | 3-Methylhistidine | 4.026 | 0.0006932 | 4.076 | 1.435 | 159 | 100 | 100 | 100 | 100 | 100 |
kc_71 | N-Acetylornithine | 4.268 | 0.0006216 | 3.55 | 16.53 | 2 | 100 | 100 | 59.09 | 93.18 | 93.18 |
kc_72 | Arginine | 4.008 | 0.0003901 | 2.228 | 1.136 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_75 | Citrulline | 3.651 | 0.0004082 | 2.318 | 1.278 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_79 | Glucosamine | 3.799 | 0.0009793 | 5.438 | 1.526 | 15 | 91.67 | 56.82 | 27.27 | 54.55 | 84.09 |
kc_81 | Fructose | 4.26 | 0.0006844 | 3.371 | 2.482 | 181 | 100 | 100 | 100 | 100 | 100 |
kc_82 | Mannose | 4.26 | 0.0006844 | 3.371 | 2.482 | 181 | 100 | 100 | 100 | 100 | 100 |
kc_85 | Tyrosine | 3.718 | 0.001131 | 6.213 | 4.702 | 1 | 100 | 100 | 38.64 | 54.55 | 100 |
kc_86 | 1-Methyluric acid | 2.601 | 0.000955 | 5.217 | 52.36 | 0 | 87.5 | 84.09 | 79.55 | 84.09 | 79.55 |
kc_88 | galactitol | 3.4 | 0.00198 | 10.82 | 15.73 | 0 | 100 | 65.91 | 47.73 | 59.09 | 15.91 |
kc_90 | Phosphorylcholine | 5.55 | 0.001941 | 10.54 | 2.101 | 1 | 100 | 100 | 97.73 | 100 | 6.818 |
kc_95 | Caffeine | 3.068 | 0.0006805 | 3.488 | 1.042 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_100 | ADMA | 4.692 | 0.0005263 | 2.591 | 1.232 | 0 | 100 | 100 | 100 | 100 | 100 |
kc_103 | Tryptophan | 3.997 | 0.0006472 | 3.156 | 1.302 | 90 | 100 | 100 | 79.55 | 100 | 100 |
kc_106 | Phosphocreatine | 3.767 | 0.00126 | 5.944 | 1.716 | 0 | 100 | 90.91 | 95.45 | 100 | 4.545 |
kc_122 | Sphingosine | 3.279 | 0.000666 | 2.218 | 1.962 | 0 | 100 | 97.73 | 100 | 100 | 100 |
kc_123 | Acetylneuraminic Acid | 3.537 | 0.00144 | 4.335 | 3.347 | 0 | 100 | 100 | 63.64 | 97.73 | 100 |
kc_125 | Sucrose | 4.288 | 0.002523 | 6.91 | 2.576 | 86 | 91.67 | 77.27 | 95.45 | 100 | 25 |
Summarizing, not bad. I'll repeat with a ppm = 50
and eventually also ppm = 30
.
I think was to be expected (the peak refinement) otherwise xcms would not be doing its job. This I've already seen.
It will be interesting to see if the overlaps (multiple peaks) decrease with smaller ppm.
Yes agreed, I'm having some success with the chromatography peak detection. Caffeine (well behaved peak with mostly a single chrom peak) is reliably detected with reasonable start/stop regions as well. I'm now testing how it goes with all of them.
With the same analysis as you did with hard-limit retention times and using an old version of the known compounds I was able to reliably detecting the peak where it should be. Plus I got slightly better results from the same check you did (something like 12 ppm rather than the 18 ppm), but it definitely is dependent on the m/z window one is using. With 50 ppm window I get 12 ppm, with 100 ppm window I get closer to 20 ppm and using 25 ppm window I get 10 ppm.
I'm looking more closely at what this guy did with the centwave algorithm: https://github.com/nathaniel-mahieu/centWaveP I think this is something that will definitely help you when you start thinking about the chromatogram peak detection, might also benefit us in the broad / difficult chrom peaks in the untargeted.
And yes we are having some really good results out of this data set regarding known compounds. Of course I give all credit to the fantastic spectra averaging :stuck_out_tongue_winking_eye:
btw.. I'm getting a lot of these, but they are not repeatable (running the same script again complains in different places):
Error in x$.self$finalize() : attempt to apply non-function
Re Error: I know - me too. there's something freaky going on with the Rcpp/cpp/pwiz bindings. No idea how we could get rid of that (without major performance decrease).
It's more annoying than anything else at this point since there is nothing to do except ignore it, are we the only ones seeing this? For a new user this would be a bit concerning.
I'm now running the analysis with ppm = 50
and perform the comparisons to other ppm
settings once I defined the workflow. With ppm = 50
I get (considering all identified peaks):
all | capillary | plasma | RBC | venous | |
---|---|---|---|---|---|
peak_count.0% | 6244 | 5840 | 4470 | 2906 | 6786 |
peak_count.25% | 6969 | 7368 | 5630 | 6820 | 7850 |
peak_count.50% | 8060 | 8005 | 6564 | 7279 | 8752 |
peak_count.75% | 8241 | 8476 | 7161 | 8114 | 9628 |
peak_count.100% | 8690 | 9871 | 7681 | 9440 | 10409 |
median_mz_width.0% | 4.801 | 4.576 | 3.978 | 3.461 | 5.107 |
median_mz_width.25% | 5.176 | 5.333 | 4.317 | 5.31 | 5.513 |
median_mz_width.50% | 5.525 | 5.637 | 4.729 | 5.63 | 5.76 |
median_mz_width.75% | 5.95 | 6.098 | 5.209 | 6.248 | 6.64 |
median_mz_width.100% | 6.278 | 7.394 | 6.215 | 8.987 | 7.401 |
median_rt_width.0% | 3.627 | 3.627 | 3.627 | 3.627 | 3.627 |
median_rt_width.25% | 3.627 | 3.627 | 3.627 | 3.627 | 3.627 |
median_rt_width.50% | 3.627 | 3.627 | 3.627 | 3.627 | 3.627 |
median_rt_width.75% | 3.627 | 3.627 | 3.627 | 3.627 | 3.627 |
median_rt_width.100% | 3.627 | 3.906 | 3.627 | 3.627 | 3.627 |
This summarizes the number of identified peaks per source type and the distribution of per-sample median m/z and rt widths.
The per-sample median m/z width of the identified peaks is on average quite low and consistent across sample types, same as the retention time width. The only apparent differences are for the numbers of detected peaks. Here plasma samples show the lowest and venous samples the highest numbers.
Re-opening issue, because we should/could improve here. I do for example not understand how centWave defines this peak:
red indicates the peak identified on the full data, blue is the peak identified with the same settings, but running the detection on the chromatographic data only (i.e. using findChromPeaks,Chromatogram,CentWaveParam
method). And green is using integrate = 2
and peakwidth = c(2, 14)
. The 2
in the peakwidth
parameter is actually important, because any value below 2 will cause the descend peak function that defines the boundaries to not allow any outliers. UPDATE actually, the last comment was wrong. Even with peak sizes smaller 2 outliers are allowed.
peakwidth = c(1.5, 14)
and integrate = 2
tends to create too broad chrom peaks. The plots below shows the same compound with integrate = 1
and integrate = 2
:
With integrate = 1
(uses the mexican hat filtered data) peaks are mostly centered around the apex, while integrate = 2
(that walks down the actual signal) includes in my opinion too much data.
From the total numbers of peaks and correspondence results there is almost no difference between integrate = 1
and integrate = 2
.
Nevertheless, the integrate = 2
seems to be an interesting alternative, since it captures better the real peak shape.
Increasing the lower peakwidth to 2 (peakwidth = c(2, 14)
) reduces the number of features with multiple peaks. Chromatographic peaks for all known compounds look still OK.
Update: we will use peakwidth = c(2, 14)
because of the fewer multi-peak features and a slightly lower maximal difference (median across all features) between feature's abundances in QC samples. integrate = 2
could even reduce the number of multi-peak features.
Perform the chromatographic peak detection on the positive polarity data. This should also include some validation and quality assessment of the identified peaks.