OpenDendro / dplR

This is the dev site for the dplR package in R
37 stars 14 forks source link

"universal" or "signal-free" power transformation #8

Open sklesse opened 3 years ago

sklesse commented 3 years ago

Dear Andy and Christian (and everyone else who is reading here),

I was recently contemplating about power transformation and how the current way if doing it individually sample by sample is a somewhat strange or outdated concept.

In the current way of calculation each sample per plot has its own underlying spread-v-level (SvL) relationship. But that feels wrong and to me somehow defies ecological logic. Shouldn't there be an overarching relationship - at least within a site, or within a species? Isn't this heteroscedasticity simply an allometric phenomenon? Also, thinking of (generalized) linear models, one usually applies only one error distribution model to all samples - either Gaussian-normal with log-transformation of the predictand or using untransformed series with a Gamma error distribution. None of which are really perfect and come with their own disadvantages.

I started thinking about this problem, when I realized that the original (now fixed) function in R was not constrained to have a maximum slope b of one, where 1-b would equal 0, and warrant a true log-transformation. Simply taking the absolute when 1-b turns negative, as written by Ed in his 1997 paper, seems extremely arbitrary to me.

Also, I noticed that there is no constraint for 1-b to be greater than 1 (b being negative). This doesn't happen too often, but there are samples in the ITRDB, e.g in az090.rwl. This absolutely makes no sense, to further blow up variability at high growth rates, relative to variability at lower growth rates. I have seen exponents of ^1.28 or ^1.41.

As it is formulated now runn.S is simply a function of runn.M. But this omits that not every year has the same change in environmental conditions to the previous year. Assuming two climatically identical years (with an effect on growth of sd towards zero) appear in the juvenile (at high growth rate) or the adult stage (at a lower growth rate) of a tree has a totally different effect on the regression line. In the juvenile stage it would lead to a deflated regression line (low sd at high mean), where as in the adult stage it would influence the regression line (and hence b) to the opposite and increase it.

How can we account for this potentially distorting influence? Very simply. We have to look at all samples simultaneously in a linear model and account for the common year effect. I call it "signal-free" power transformation, because this exactly the same idea behind SF-detrending.

In the following I have created seven assumption-based scenarios: lm(run.S ~ run.M)#the simplest model, Spread simply depends on Level, environmental variation (=climate) does not play a role lm(run.S~run.M*ID) #no universal relationship, allowing SvL to vary between samples, no common year influence. This yields identical exponents to the way it is currently calculated.

lmer(run.S~run.M + (1|ID) #universal relationship, allowing for variation in the intercept between samples lmer(run.S~run.M + (run.M|ID) #universal relationship of interest, but captures variation in intercept and slope between samples

The "universal" estimate (the fixed effect slope) of these two models should be close to the median of all individual slopes in the current way of calculation.

lmer(run.S~run.M + (1|ID) + (1|year) #universal relationship, influenced by climate variation, variation in the intercept between samples lmer(run.S~run.M + (run.M|ID) + (1|year) #universal relationship, influenced by climate variation, measure random SvL (intercept + slope) between samples

lmer(run.S~run.M + (1|year) #universal relationship, influenced by climate variation only

These three models account for a common year-to-year influence on the estimation procedure and yield most of the times similar results to each other. From some early tests, the "universal" slope estimates are definitely different from the estimates without random year effect, I have seen differences >0.25 in the median power estimate. And the year effect captures a large fraction of the residual variance (up to 46% in some cases).

I think it's time for an ITRDB meta-analysis, diving into these distorted power transformation estimates, and provide a good case in a Dendrochronologia paper for a reform in how to better (no, best!) transform ring-width series. One big question I have is, whether 1-b converges to a very narrow band of possible values, the more series from a region/species are aggregated. Is there actually truly a "universal" scaling? IF yes, this would be definitely more than just a Dendrochronologia paper. These results would probably apply in the same way to BAI and as such should be received very well across all disciplines working with some sort of diameter increment

I am looking forward to hear your thoughts on any of the above-mentioned concerns and ideas.

Cheers, Stefan

kanchukaitis commented 2 months ago

Bumping this from a few years ago. Worth discussing the merits/drawbacks of a more individualistic approach proposed by @sklesse. If you'd like this to be a new feature, @sklesse, it would be great to receive a PR, since current energies are dedicated to other Issues.

A meta-analysis of the relevant characteristics as you propose could now be facilitated by: https://github.com/OpenDendro/itrdbMeasurementsClone

The sub-issue of the assumptions built into the original ARSTAN about the sign of the power transform (e.g. the code is hardwired to stop estimating the power if skew is negative, or as @sklesse writes 'Simply taking the absolute when 1-b turns negative ... seems extremely arbitrary to me') should indeed be addressed (if they haven't already), since wood anatomical data, for instance, can often have a negative skew. Will do this in dplPy but we should ensure we have the same functionality in both R and Python sides of openDendro.

sklesse commented 2 months ago

Hi Kevin & others reading,

yes, I can come up with something over the coming weeks/months. Thanks for the link to the latest ITRDB version - for QWA data I can work with stuff we produced here at WSL over the years.