aphalo / ggpmisc

R package ggpmisc is an extension to ggplot2 and the Grammar of Graphics
https://docs.r4photobiology.info/ggpmisc
94 stars 6 forks source link

How to output the values of peaks (by stat_peaks) into a new file? #35

Closed caixu0518 closed 9 months ago

caixu0518 commented 1 year ago

These days I used your nice package, It looks brilliant. I got a minor question.

I want to save the values of peaks into a new file, but now i can only plot the values in the figure. Is there any way for me to do this?

Best, Xu

caixu0518 commented 1 year ago

I resolved it by using ggplot_build(). Thank you for your nice package.

aphalo commented 1 year ago

@caixu0518 Thanks for reporting the solution you found. Anyway, I will see if the internal function used within package 'ggmisc' is robust enough to be exported in a future version of 'ggpmisc'.

caixu0518 commented 1 year ago

many thanks for your reply. Based on your wonderful packages, I can generate all peaks now. However, I found too many false peaks from the results of 'ggpmisc'. I hope you can give me some advice for how to filter the 'false peaks'. Here, I present an example, the example shows there are four peaks (the values are 0.0130513,0.1637344,0.2895015,0.3867929)from the results of your packages, but from my eyes, there are should be only one peak.

Bin 1 dis

aphalo commented 1 year ago

@caixu0518 If two consecutive x values have identical y values, they are both returned. Could this be what you are seeing?

caixu0518 commented 1 year ago

Not exactly,I found that the output of ‘ggpmisc’ provides so many small peaks. As the example provided above, some extra small peaks are formed because the curve is not smooth enough. I want to know is there any way for me to make the curve smooth and extract the main peak? For example, the employment of 'gaussian curve'?

Best, Xu

caixu0518 commented 1 year ago

Can we correct the curve firstly, then we calculate the peaks using the ‘ggpmisc’ package?

aphalo commented 1 year ago

@caixu0518 Hi. Thanks! Some questions and some things to consider.

  1. why do you want to extract the peaks from the plot? Do you need them both in the plot and outside it? or are you only using stat_peaks() to compute the peaks and you have no use for the plot?
  2. The peaks are found as the maximum in a moving window. Arguments to parameter span control the width of the window. With a wide window you get fewer peaks, with span = NULL you get a single peak. The odd integer number passed to span determines the with of the window as number of observations. Have you tried this?
  3. If you pass to stat_peaks() strict=TRUE and two or more observations share the same maximum value within the window, none of them will be considered peaks.
  4. You can pass a value between 0 and 1 to ignore_threshold to ignore a local peaks below a given height.
  5. What type of data are you plotting? are they spectra or something else? If spectral data expressed versus wavelength in nanometres is what you are working with, my packages 'photobiology' and 'ggspectra' have functions for finding peaks as well as spikes. Simple peak fitting is also implemented. Using them also smoothing and despiking are available and easy to apply.
  6. With 'ggpmisc' you will need to smooth the data before plotting it, I think.

Except for the support of span=NULL and ignore_threshold all the work of finding the peaks in 'ggpmisc' is done by function peaks() from R package 'splus2R'. The only thing you need to be careful about is that this function takes a vector (of y values) as argument, and these y values must be ordered by their corresponding x values. In other words, in the same order as plotted along x.

aphalo commented 1 year ago

To do: Export a function to extract peaks from a data frame consistently with stat_peaks() and stat_valleys().

aphalo commented 1 year ago

Possibly also implement some kind of peak fitting and n.min parameter.

aphalo commented 9 months ago

@caixu0518 Function find_peaks() is now exported in the code at GitHub. Will be in CRAN rather soon as version 0.5.5. Peak fitting is not yet implemented. New parameters n.min and n.max could be used to dynamically adjust the span, or the ignore_threshold. Moved to new issue #48 for future implementation.