ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
87 stars 52 forks source link

Smoothing algorithms #348

Open shubhra-agrawal opened 7 years ago

shubhra-agrawal commented 7 years ago

El Maven gives the user three choices of filter algorithms for smoothing the data: Savitzky-Golay, Gaussian and Moving Average filter.

A quick search tells me that Moving average is clearly one of the least recommended filters for signal preservation; Gaussian is better- it can preserve sharp edges and filter random noise; and Savitzky-Golay preserves signal shape the best while filtering out random noise.

Are there any cases where either Gaussian or Moving average perform better than Savitzky-Golay?

chubukov commented 7 years ago

Not sure, and I haven't played with this very much. One thing to keep in mind is that the signal preservation aspect may not be critical, since the integration itself is anyway done on the raw data. AFAIK the smoothing is only used for peak finding.

chubukov commented 7 years ago

One thing that has come up is the option to filter peaks by e.g. width, to eliminate spurious one-scan peaks. An alternative is to do this via the smoothing algorithm - i.e. pick an algorithm that will not recognize one scan as a peak. At first thought I'm not sure I like this idea very much -- I think I would prefer a more clearly defined filter. But it's worth considering.

shubhra-agrawal commented 7 years ago

@chubukov Even if signal preservation is not crucial, providing three algorithms for smoothing seems like overkill considering the default algo (Savitzky-Golay) is pretty much the most highly recommended for this purpose. I wanted to know if there are any corner cases that are better handled by the other two algorithms. If not, I say we remove the Smoothing algorithm option from the UI and keep using Savitzky-Golay as default. This would help us declutter the UI a bit.

As for the peak width filter, my understanding is that the Minimum Peak Width option in the Peak Detection dialog works fine. Please correct me if it is buggy or if I misunderstood what you were saying.

chubukov commented 7 years ago

Not a huge objection to getting rid of the other smoothing options, but maybe we can just move them to some "advanced options" in case it becomes useful?

The filter in the peak detection dialog is on groups, not peaks.

shubhra-agrawal commented 7 years ago

@chubukov "Advanced Options" sounds like a great idea. We can look into identifying other such parameters as well so that a new user is not overwhelmed with too many options at once.

I get your point about the peak width filter now. I will open another issue for that.