ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
86 stars 52 forks source link

Calculated RT of a peak -- spline max vs raw max #464

Open shubhra-agrawal opened 6 years ago

shubhra-agrawal commented 6 years ago

El-MAVEN 2.1

During peak finding, we find the local maxima of EIC spline and mark the highest raw intensity over the maxima as a peak. This prevents confusion for the users as spline maxima might not always correspond to the highest intensity in that region.

Copy-pasting @chubukov 's comment from #334 discussion,

"However, the peak position is also used for finding related peaks -- see e.g. maxIsotopeScanDiff parameter, and the entire alignment algorithm. For those purposes, it makes sense to have the peak position represent "where you expect related peaks to center" -- and that's probably the position of the spline max. This could be a separate discussion and maybe it makes sense to keep both metrics. There are cases when I'd rather use the position of the highest intensity scan -- for instance if I want to get the m/z spectrum at peak apex, I don't really want to accidentally take the trough of a spiky peak."

shefalilathwal commented 6 years ago

@chubukov @shubhra-agrawal We have a dataset with a very significant number of spiky peaks and the fact that the RT is assigned to the raw max reflects in isotope detection. If both the parent peak and the isotope peaks are spiky, as we have for many metabolites in our data, the scan numbers of the detected max for parent versus isotope are many scan nos. apart (We are running into #567 because of it). In this case, it would make sense to define the maximum where 'peaks are expected to center', i.e., the spline max.

@chubukov How do you curate data with spiky peaks? If you see a spiky peak (but clearly a peak to the eye) would you pick it and try to find its isotopes or would you discard that peak?

@sunil20dhakad FYI

chubukov commented 6 years ago

@shefalilathwal Do you mean spiky like the ones @sunil20dhakad just posted, where there's clearly just a input filtering problem? Or spiky in the sense that they're just noisy? If the latter, at that point it really depends on what question you want to answer and what confidence you want to do it with.... sorry for the non-answer :). In some cases you can at least put some upper and lower bounds on the labeling, which may be good enough.

sunil20dhakad commented 6 years ago

I am adding one example that @shefalilathwal is talking about. The following is the example of aKG with all its labels.

image

image

image

image

image

image

@chubukov FYI

sunil20dhakad commented 6 years ago

When the peaks are spiky then result also depends on the peaks that are picked. For example same group (Ac-Carnitine) when picked at different positions give different results.

image

image

chubukov commented 6 years ago

@sunil20dhakad I think if you're getting two peaks in the lower example it means the smoothing is not working or is set too fine.

In the upper example, this is just what non-great lcms data looks like. It's going to be hard to quantify most of those peaks very accurately no matter what you do. Though you could probably at least get a reasonable guess and bound for all of them. I'm a little suprised that the correlation cutoff is what you ran into -- just by eye, I would have expected most of them (except I guess the 13C-1) to pass.

Edit: sorry, I realized that it's the maxIsotopeScanDiff that you're running into. That's less surprising, and yes, using splineMax might help in this case. Or you could just set it more liberally.