OpenDendro / dplR

This is the dev site for the dplR package in R
37 stars 14 forks source link

rwi.stats problem with calculating EPS - shouldn't n be length(good.trees)? #11

Closed sklesse closed 2 years ago

sklesse commented 2 years ago

Hi Andy,

I am currently working with a pretty short dataset (nrow=23, ncol=268) from a provenance trial. Only 2 cores cover the entire 23 years. While exploring the data I stumbled over an issue in rwi.stats, in particular the n it uses to calculate EPS. So because 23 years is shorter than 30, min.corr.overlap is automatically set to 23.

The good.flag loop of the rwi.stats.running function (lines 171-188) returns exactly these two samples. good.trees <- which(good.flag)

length(good.trees)equals 2. so far so good.

However, in line 227, it says:

 if (period.common) {
      n <- nTrees
    }
    else {
      n <- mean(n.trees.by.year[year.idx])
    }

Shouldn't n be simply assigned n<-length(good.trees), no matter what? I have to admit I did not check the requirements for the argument period.common=T. Just look at the output below:

n.cores n.trees n n.tot n.wt n.bt rbar.tot rbar.wt rbar.bt c.eff rbar.eff eps snr 1 268 268 207.391 1 0 1 0.729 NA 0.729 1 0.729 0.998 558.594

There is only one pair of correlations calculated, which makes sense. so n must be limited to 2. You cannot use 207.391 in this case. The EPS is thus clearly wrong and too high and should return 0.843262 and not 0.998. The SNR is obviously equally affected.

I am aware that my dataset is a fringe case, but I can only assume that this calculation error occurs all the time, when series drop out or come in and don't cover the entire min.corr.overlap period. In fact, I just checked with a "normal" length dataset. It happens there, too, even if I use the normal min.cor.overlap of 30 years and look at the first 50 years. Because the rbar calculation is clearly defined by the argument that the common length of observation has to be min.corr.overlap. If it is shorter, no correlation coefficient is returned. The calculation of n.tot, n.wt and n.bt is clearly correct. but the n that is used for the calculation of eps is not! Also here, rwi.stats returns an n of 9.5 when in truth only 5 series meet the 30 year min.corr.overlapcriterion during the first 50 years of the dataset.

For 95% of all applications, this bug will not matter at all. But it's a bug nevertheless, and should probably be fixed. Unless, of course, there is a reason for the way n is calculated that I am totally unaware of. Cheers, Stefan

AndyBunn commented 2 years ago

Hey! I've pinged Mikko who (re)wrote that function to have him weigh in. But I think you are correct. More soon and thanks! -A

mvkorpel commented 2 years ago

Hi!

I think the following comment in the code is relevant to the issue:

            ## Number of trees averaged over the years in the window.
            ## We keep this number separate of the correlation
            ## estimates, i.e. the data from some tree / year may
            ## contribute to n without taking part in the correlation
            ## estimates.

I vaguely remember studying the definition of EPS from the Cook & Kairiukstis book when writing the code in question. I guess my coding decisions made sense to me at that time, but I may have been overthinking or underthinking about it.

I'm afraid this isn't too helpful.

mvkorpel commented 2 years ago

@AndyBunn Could you check out my proposed fix for this issue?

AndyBunn commented 2 years ago

Sorry. I’ll look tomorrow.

—- This device necessitates brevity

On Apr 30, 2022, at 1:08 PM, Mikko Korpela @.***> wrote:



@AndyBunnhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAndyBunn&data=05%7C01%7Cbunna%40wwu.edu%7C50fd6897b9c246254ff508da2ae5390f%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637869461253768138%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1ChHLl%2Fjdfv%2B3SKrldXm9R6RfhEQ8udRIfEPR%2FFlWp0%3D&reserved=0 Could you check out my proposed fixhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmvkorpel%2FdplR%2Fcommit%2F34827c50885694f950873b73e76decf60beb0230&data=05%7C01%7Cbunna%40wwu.edu%7C50fd6897b9c246254ff508da2ae5390f%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637869461253768138%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aEhxBY5TxGzHdQ39wLDXvsP5JUp8KG1atEuUtu653gU%3D&reserved=0 for this issue?

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAndyBunn%2FdplR%2Fissues%2F11%23issuecomment-1114046698&data=05%7C01%7Cbunna%40wwu.edu%7C50fd6897b9c246254ff508da2ae5390f%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637869461253768138%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zQWGLAgIvRIokRYC3iNSw1GEKx61hp56gxAWym8cOR4%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC7UCXOO6HVZARRJ6DPSQ3LVHWHMTANCNFSM5UPDG3RQ&data=05%7C01%7Cbunna%40wwu.edu%7C50fd6897b9c246254ff508da2ae5390f%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637869461253768138%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TZUn%2F9Gq3UBfewdEi9aCrMH%2Fx1Ui6jjWdDzFGiHBinQ%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

AndyBunn commented 2 years ago

Ok. Change made to dev version. Here. @sklesse, take a look and see what you think. Can edit as needed.

AndyBunn commented 2 years ago

Ok. I had closed this after a covid-induced typo! Changed n<-length(good.trees)