Looking for a documentation of areaTop value

ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.

https://resources.elucidata.io/elmaven/

GNU General Public License v2.0

85 stars 52 forks source link

Looking for a documentation of areaTop value #1417

Closed sorenwacker closed 2 months ago

sorenwacker commented 1 year ago

Hi,

I would like to know who El-Maven defines the AreaTop value. I looked in the documentation, but could not find that information.

shubhra-agrawal commented 1 year ago

AreaTop is the smoothened maxima of a peak. Mean value of the actual peak maxima and 1 neighbouring point on both sides of the maxima.

sorenwacker commented 1 year ago

Thank you!

sorenwacker commented 1 year ago

When I try to reproduce these values, I find this phenomenon: It looks like El-maven applies baseline correction as well. Is the reason for this behaviour? What you see here is the reimplementation of your areaTop score, but in some cases a constant value is removed for a set of files resulting in smaller values as compared to blindly applying the algorithm. Often the value is very similar for groups of files. Sometimes, there are multiple steps even for different groups of files.

sorenwacker commented 1 year ago

Actually, it only looks like a linear offset in log-scale. Linear it looks like this:

sorenwacker commented 1 year ago

Do you have any idea where this discrepancy might come from? Is it a baseline correction? But why then this linear dependency on the areaTop value? It looks like El-Maven always divides by 3, even if there are only 2 or 1 datapoints in the extraction window.

shubhra-agrawal commented 1 year ago

Baseline correction happens before any peak detection in El-MAVEN. Just to correct for technical noise across samples. We use a simple quantile method for baseline correction, where we smoothen the peak and set the baseline based on the user-set threshold. PeakAreaTop by default shows the corrected intensity. There is another option for AreaTopNotCorrected that can be used to get the true PeakAreaTop value without the basline correction.

It looks like El-Maven always divides by 3, even if there are only 2 or 1 datapoints in the extraction window. That might be a bug, will have to check.