OpenDendro / dplR

This is the dev site for the dplR package in R
37 stars 14 forks source link

Variance stabilization #3

Closed sklesse closed 3 years ago

sklesse commented 4 years ago

Are there any intentions to include variance stabilization following Frank et al. (2007, GRL)? I only have a running version that relies on the ts() class, but I guess that could be easily rewritten to fit the "rwl" class objects.

Cheers, Stefan

AndyBunn commented 4 years ago

Stefan, I was meaning to tackle this but the timing was bad! Love to include this in dplR. And I've got some time to putter on the package now as I clean things up for doing substantial work. Do you want to add your code as a contributor here? Or send it to me and I'll wrangle it in and add you directly.

sklesse commented 4 years ago

Hi Andy, I think it would be great to add it to dplR. So far, it's written for ts() formatted objects, but I should be able to rearrange most of it to be compatible with dplR-native data.frames in the next days. I will upload the code here.

AndyBunn commented 4 years ago

Cool. Go for it. If it works on class ts then wrangling it into rwl should be trivial as rwl is just a data.frame with the years as the rownames. You can get the years out of a rwl object using time(rwl) as needed.

sklesse commented 4 years ago

Ok, here it is. I just had to rewrite some lines to get rid of the ts classes.

The function was originally written by David Frank. It follows Frank et al. (2006, TRACE proceedings) or Frank et al. (2007, GRL) and returns uncorrected chronologies equal to chron(x, biweight=F or T) and their simple rbar corrected (Osborn et al., 1997, Figure 2d) as well as running rbar adjusted counterparts.
I only modified a couple of lines for speed purposes.

The actual desired output would be column 6, the bi-weight running rbar adjusted chronology. One could delete the other lines referring to simple rbar adjusted chronologies, or keep them. Corresponding to the chron function one could also insert some flags in the function arguments, e.g. biweight=T/F, running.rbar=T/F and only return two of those chronologies (uncorrected and corrected) instead of all six variants.

require(zoo) #for rollapply
#x=detrended data.frame as returned from detrend()
#WL= window length for running rbar 
#WL default set to be 51 years, Frank et al. 2007 use 100 years 

stabilizevariance <- function(x,WL=51) {

  observationoverlap <- function(x) {
    presencematrix <- 0*(as.matrix(x))+1
    presencematrix[is.na(presencematrix)] <- 0
    overlapmatrix <- t(presencematrix)%*%presencematrix
    overlapmatrix
  }

  runningrforSTABILIZEVARIANCE <-function (data, win.length=WL) {

    cor.mat <- cor(data, use="pairwise.complete.obs")
    diag(cor.mat) <- NA
    obs.overlap.mat <- observationoverlap(data)
    cor.mat[obs.overlap.mat < (win.length/3)] <- NA  
    MEAN.R.trunc <- mean(cor.mat, na.rm=TRUE)
    return(MEAN.R.trunc)
  }

  WL<-WL
  mean.x <-mean(rowMeans(x,na.rm=TRUE,dims=1)) #pre-process the data to have a mean of zero
  x <- x-mean.x  
  samplesize <- rowSums(!is.na(x))
  chronology <- rowMeans(x,na.rm=T)
  biwgt.chronology <- apply(x,1,tbrm)

  correlmatrix <- cor(x,use="pairwise.complete.obs")
  diag(correlmatrix) <- NA
  MEAN.R <- mean(correlmatrix, na.rm =TRUE) #rbar

  rbar.running <- rollapply(x,WL,runningrforSTABILIZEVARIANCE,by=1,by.column=FALSE,fill = NA)
  rbar.running <- na.locf(rbar.running,na.rm=FALSE) #fill last WL/2 values
  rbar.running <- rev(na.locf(rev(rbar.running),na.rm=FALSE)) #fill first WL/2 values
  rbar.running[samplesize==0] <- NA

  effsamplesize <- samplesize/(1+(samplesize-1)*rbar.running)
  effsamplesize <- pmin(effsamplesize,samplesize,na.rm=TRUE) 
  # takes care of setting the effsamplesize to 1 when sampledepth=1
  # and also if rbar goes negative effsamplesize gets larger than samplesize, and this brings it back down.

  simpleeffsamplesize <- samplesize/(1+(samplesize-1)*MEAN.R)

  RUNNINGadjustedchronology <- chronology*(effsamplesize*MEAN.R)^.5
  RUNNINGadjusted.biwgt.chronology  <- biwgt.chronology*(effsamplesize*MEAN.R)^.5

  SIMPLEadjustedchronology <- chronology*(simpleeffsamplesize*MEAN.R)^.5
  SIMPLEadjusted.biwgt.chronology <- biwgt.chronology*(simpleeffsamplesize*MEAN.R)^.5

  everything <- cbind(chronology,biwgt.chronology,SIMPLEadjustedchronology,SIMPLEadjusted.biwgt.chronology,RUNNINGadjustedchronology,RUNNINGadjusted.biwgt.chronology,samplesize,effsamplesize,rbar.running)
  rownames(everything)<-rownames(x)
  everything <- scale(everything,center=c(rep(-mean.x,6),rep(0,3)),scale=FALSE)#add back the mean to the dataset

  return(everything)

}

ts.plot(stabilizevariance(x,WL=51)[,c(2,4,6)],col=1:3)
AndyBunn commented 4 years ago

Great. I'll port this over. Thanks.

AndyBunn commented 3 years ago

@sklesse -- While I didn't forget about this, I didn't get around it. Should have some time soon.

AndyBunn commented 3 years ago

Ok. Check out the chron.stabilize function in the repo. Just committed it. Pretty rough now. Let's discuss.

sklesse commented 3 years ago

Hi Andy, looks great and is very fast! :-)

One little "error" is in line 34 for(i in 1:(nrow(x0)-winLength+1)){

your initial code has for(i in 1:(nrow(x0)-winLength-1)){ This results in 2 pairs of rbar not being evaluated. In the default setting (winLength=51) there should be 25 NAs in movingRbarVec at the end. With your code, there are 27. changing -1 to +1 in the for-loop does the trick.

AndyBunn commented 3 years ago

Cool. Thanks for that @sklesse.

Questions.

I think, there should be some bounds or at least warnings on window length. E.g., what should be the min/max length? E.g., min of 10% max of 50% of chron length? And should the function allow even numbers to be passed in and just increment them +1? The reason for having the number be odd won't be obvious to the users.

Next, should the corMat[overlapMatrix < (WL/3)] <- NA on line 15 be hard coded as 1/3 or can that be an argument to the function?

And help file will need work. Can you send some verbiage?

I think that's it for now...

sklesse commented 3 years ago

Those are good questions.

I think a hard minimum of 31 years wouldn't harm - at least with a warning flag if one chooses a shorter window. Computing correlations over shorter periods is not very powerful and might even produce artifacts (this is just a gut feeling - I am not aware of a formal test on this matter). What are the most commonly used rbar/eps-windows in ARSTAN/dplR? I'd say between 30 and 50 years. So, the 51-year default is a good starting point. In David's GRL Paper they use a 100-window for a 1000-year long chronology. In the supplement, they show how the chronology would differ when using a 50- or-200 year window. The former (latter) keeps (loses) more low frequency in the early low replicated part.

So, yeah, maybe a 50% max-value warning as you suggested also wouldn't harm.

odd/even numbers: Why not? We would just have to add: ` movingRbarVec <- rep(NA,nrow(x0))

if(winLength%%2 != 1){
  for(i in 1:(nrow(x0)-winLength+1)){
    movingRbarVec[i+(winLength-1)/2] <- rbarWinLength(x0[i:(i+winLength-1),])
  } 
}else{
  for(i in 1:(nrow(x0)-winLength)){
    movingRbarVec[i+(winLength)/2] <- rbarWinLength(x0[i:(i+winLength),])
  } 
}
`

corMat: I would leave it hard-coded. There should be a certain overlap with the chosen window for the correlation pair to be considered.

I will have a look at the help file later today. Cheers, Stefan

AndyBunn commented 3 years ago

Ok. I added in warnings for short (<30) or long (>50%) winLengths and an error if winLength is > than nrow(x). I changed the odd/even winLength as above and simplified the padding of the NAs. Can you double check the indexing on the odd/even split. E.g.,

data(co021)
co021.rwi <- detrend(co021,method = "Spline")
# create a little subset for tests
foo <- co021.rwi[301:400,]
#drop empty cols
foo <- foo[,!is.na(colSums(foo))]
# check with a silly short winLength -- note warning
foo2 <- chron.stabilized(foo,winLength=10,running.rbar = TRUE)
head(foo2,10) # first four values get padded.
tail(foo2,10) # last five values get padded.
foo2 <- chron.stabilized(foo,winLength=11,running.rbar = TRUE)
head(foo2,10) # first five values get padded.
tail(foo2,10) # last six values get padded. Should be last five?
AndyBunn commented 3 years ago

I'm closing this as an issue -- it's in the dev version and almost tidied up.