OpenDendro / dplR

This is the dev site for the dplR package in R
37 stars 14 forks source link

some more thoughts for the future (dplR v2.x) #22

Open sklesse opened 3 months ago

sklesse commented 3 months ago

Hi Andy & everyone contributing and maintaining dplR /dplPy.

I repeatedly am confronted with the unnecessary behavior of rwi.stats.running to not correctly calculate rbar when you detrend with difference=T.

That the detrending via division throws you an error when the estimated line goes below zero and reverts to detrending via the mean is great. But when people say difference=T they probably have already converted their values (log- or power-transformed) or they work with stable isotopes. In the case of log-transformed RW or d13C we automatically work with negative values, and it would be great if there was an option in detrend() to say: "I know what I am doing, fit "spline" curve anyways, even when fit goes negative". I am aware that adding a simple line in your code transposing the data by +100 is a very effective workaround, but sometimes I am asking myself if there couldn't be a fix for that. :-) The workaround is probably easier than changing detrend.series() with the multiple detrending options.

Anyways, I digress, the "dirty dog" part is actually not the reason why I opened this issue. Why is the rbar and hence eps and snr calculation affected by close-to-zero mean values? That does not make sense.

tmp <- normalize1(rwi, n, prewhiten) if(!all(tmp$idx.good)) { warning("after prewhitening, 'rwi' contains column(s) without at least four observations", call.=FALSE) cat(gettext("note that there is no error checking on column lengths if filtering is not performed\n", domain="R-dplR")) }

After some step-by-step testing I found normalize1() to be the "culprit". This function needs to be changed. At least, I do not understand why it does what it does. Why do anything, if I do not want to prewhiten the data? Why divide the data by their mean at all? Even for the pre-whitening this steps seems unnecessary. You can do the ar-modeling at any mean. And for the rbar calculation the mean doesn't matter. Am I missing something here, or why was this implemented? I would suggest deleting the lines that say "divide by time series mean" in normalize1(). Seems like an easy fix. What do you think?

BTW: This "set mean to 1" functionality in the detrend.series "Ar" method created also the funny constant offset of Anderegg's 2015 Figure S10, where their legacy baseline is weirdly at ~-0.01 and not 0. because each normally detrended time series actually has a usually a mean of something around 0.99-something (there is variation around the means, but by and large they are slightly <1), but the mean of the prewhitened time series is always exactly 1.

In the detrend.series() function the following lines

 if (difference) {
      resids$Ar <- Ar - mean(Ar, na.rm = TRUE)
    }
    else {
      resids$Ar <- Ar/mean(Ar, na.rm = TRUE)
    }

could be changed to

 if (difference) {
      resids$Ar <- Ar - mean(y2, na.rm = TRUE) 
    }
    else {
      resids$Ar <- Ar/mean(y2, na.rm = TRUE)
    }

Just an idea.

AndyBunn commented 3 months ago

Hi Stefan, thanks for your thoughts. I’ve been travelling but will have sometime next week to look at this.

From: sklesse @.> Date: Monday, July 1, 2024 at 6:31 AM To: OpenDendro/dplR @.> Cc: Subscribed @.***> Subject: [OpenDendro/dplR] some more thoughts for the future (dplR v2.x) (Issue #22)

Hi Andy & everyone contributing and maintaining dplR /dplPy.

I repeatedly am confronted with the unnecessary behavior of rwi.stats.running to not correctly calculate rbar when you detrend with difference=T.

That the detrending via division throws you an error when the estimated line goes below zero and reverts to detrending via the mean is great. But when people say difference=T they probably have already converted their values (log- or power-transformed) or they work with stable isotopes. In the case of log-transformed RW or d13C we automatically work with negative values, and it would be great if there was an option in detrend() to say: "I know what I am doing, fit "spline" curve anyways, even when fit goes negative". I am aware that adding a simple line in your code transposing the data by +100 is a very effective workaround, but sometimes I am asking myself if there couldn't be a fix for that. :-) The workaround is probably easier than changing detrend.series() with the multiple detrending options.

Anyways, I digress, the "dirty dog" part is actually not the reason why I opened this issue. Why is the rbar and hence eps and snr calculation affected by close-to-zero mean values? That does not make sense.

tmp <- normalize1(rwi, n, prewhiten) if(!all(tmp$idx.good)) { warning("after prewhitening, 'rwi' contains column(s) without at least four observations", call.=FALSE) cat(gettext("note that there is no error checking on column lengths if filtering is not performed\n", domain="R-dplR")) }

After some step-by-step testing I found normalize1() to be the "culprit". This function needs to be changed. At least, I do not understand why it does what it does. Why do anything, if I do not want to prewhiten the data? Why divide the data by their mean at all? Even for the pre-whitening this steps seems unnecessary. You can do the ar-modeling at any mean. And for the rbar calculation the mean doesn't matter. Am I missing something here, or why was this implemented? I would suggest deleting the lines that say "divide by time series mean" in normalize1(). Seems like an easy fix. What do you think?

BTW: This "set mean to 1" functionality in the detrend.series "Ar" method created also the funny constant offset of Anderegg's 2015 Figure S10, where their legacy baseline is weirdly at ~-0.01 and not 0. because each normally detrended time series actually has a usually a mean of something around 0.99-something (there is variation around the means, but by and large they are slightly <1), but the mean of the prewhitened time series is always exactly 1.

In the detrend.series() function the following lines

if (difference) {

 resids$Ar <- Ar - mean(Ar, na.rm = TRUE)

}

else {

 resids$Ar <- Ar/mean(Ar, na.rm = TRUE)

}

could be changed to

if (difference) {

 resids$Ar <- Ar - mean(y2, na.rm = TRUE)

}

else {

 resids$Ar <- Ar/mean(y2, na.rm = TRUE)

}

Just an idea.

— Reply to this email directly, view it on GitHubhttps://github.com/OpenDendro/dplR/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC7UCXPYA2R3LFT5QNUAE3TZKFK4JAVCNFSM6AAAAABKFTBIVKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4DGOBXGEYDIOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>