BiologicalRecordsCentre / sparta

Species Presence/Absence R Trends Analyses
http://biologicalrecordscentre.github.io/sparta/index.html
MIT License
21 stars 24 forks source link

More options for occurrenceChange #65

Closed drnickisaac closed 6 years ago

drnickisaac commented 6 years ago

The current version of occurrenceChange takes two years and calculates the percentage difference in occupancy. In the years since this function was created, we've experimented with a bunch of different statistics, including absolute change and growth rates. I think these are simple transformations of one another, so it should be possible to bundle all these into a single function that gives users great flexibility.

I've asked @CharlieOuthwaite to add her own code for calculating the growth rates, but I'd like @GPowney to consider what other options should be included.

Also, we need to check (and document) the behaviour of this function when region codes are used.

GPowney commented 6 years ago

Here is the growth rate function. This can be used in place of the proportional change section of the occurrenceChange script.

Use this: _annual_growthrate <- function(first, last, nyr){ (((last/first)^(1/nyr))-1)*100 }

_annual_growthrate (first = results[,1], last = results[,2], nyr = length(predicted))

Instead of, or in addition to this: results$change = (results[,2] - results[,1]) / results[,1]

CharlieOuthwaite commented 6 years ago

Working on adding these options in now...

Currently estimates are based on predicted results from a glm, is this still what we want to do or do we just want to use the posterior of the psi.fs values from sims.list? See function in current form below.

occurrenceChange <- function(firstYear, lastYear, bayesOut){

if(!firstYear %in% bayesOut$min_year:bayesOut$max_year) stop('firstYear must be in the year range of the data') if(!lastYear %in% bayesOut$min_year:bayesOut$max_year) stop('lastYear must be in the year range of the data')

occ_it <- bayesOut$BUGSoutput$sims.list$psi.fs colnames(occ_it) <- bayesOut$min_year:bayesOut$max_year years <- firstYear:lastYear

prediction <- function(years, series){

# cut data
data_table <- data.frame(occ = series[as.character(years)], year = (years - min(years) + 1))

# run model
model <- glm(occ ~ year, data = data_table, family = 'binomial')

# create predicted values
predicted <- plogis(predict(model))
names(predicted) <- years

# build results
results <- data.frame(predicted[1], predicted[length(predicted)], row.names = NULL)
colnames(results) <- as.character(c(min(years), max(years)))
results$change = (results[,2] - results[,1]) / results[,1]

return(results)

}

drnickisaac commented 6 years ago

We should allow for multiple options, with our preferred option as the default. I think there are four options:

CharlieOuthwaite commented 6 years ago

Okay, I have made those four options available. I have also added a parameter to specify which region (if any) you want to use the psi.fs values for. If nothing is specified then the overall psi.fs values are used from the sims list. Function added below in case you wanted to check it! I will pull request to merge with the master.


occurrenceChange <- function(firstYear, lastYear, bayesOut, change = 'growthrate', region = NULL){

  # error checks for years
  if(!firstYear %in% bayesOut$min_year:bayesOut$max_year) stop('firstYear must be in the year range of the data')
  if(!lastYear %in% bayesOut$min_year:bayesOut$max_year) stop('lastYear must be in the year range of the data')

  # error checks for change
  if(!class(change) == 'character') stop('Change must be a character string identifying the change metric.  Either: difference, percentdif, growthrate or lineargrowth')
  if(!change %in% c('difference', 'percentdif', 'growthrate', 'lineargrowth')) stop('The change metric must be one of the following: difference, percentdif, growthrate or lineargrowth')

  # error check for region
  if(!class(region) == 'character') stop('region must be a character string identifying the regional estimates that change is to be calculated for.')

  # extract the sims list, if there is a region code, use the psi.fs for that region
  if(!is.null(region)){
  reg_code <- paste("psi.fs.r_", region, sep = "")

  occ_it <- bayesOut$BUGSoutput$sims.list
  occ_it <- occ_it[[grep(reg_code, names(occ_it))]]

  }else{
    occ_it <- bayesOut$BUGSoutput$sims.list$psi.fs

  }

  colnames(occ_it) <- bayesOut$min_year:bayesOut$max_year
  years <- firstYear:lastYear

  ### loops depend on which change metric has been specified

  if(change == 'lineargrowth'){
  prediction <- function(years, series){

    # cut data
    data_table <- data.frame(occ = series[as.character(years)], year = (years - min(years) + 1))

    # run model
    model <- glm(occ ~ year, data = data_table, family = 'binomial')

    # create predicted values
    predicted <- plogis(predict(model))
    names(predicted) <- years

    # build results
    results <- data.frame(predicted[1], predicted[length(predicted)], row.names = NULL)
    colnames(results) <- as.character(c(min(years), max(years)))
    results$change = (results[,2] - results[,1]) / results[,1]

    return(results)

  }

  res_tab <- do.call(rbind, apply(X = occ_it, MARGIN = 1, years = years, FUN = prediction))

  } # end of loop for linear growth rate

  if(change == 'difference'){

    res_tab <- data.frame(occ_it[, 1], occ_it[, ncol(occ_it)], row.names = NULL)
    colnames(res_tab) <- as.character(c(min(years), max(years)))
    res_tab$change = res_tab[,2] - res_tab[,1]

  } # end of loop for simple difference

  if(change == 'percentdif'){

    res_tab <- data.frame(occ_it[, 1], occ_it[, ncol(occ_it)], row.names = NULL)
    colnames(res_tab) <- as.character(c(min(years), max(years)))
    res_tab$change = ((res_tab[,2] - res_tab[,1])/res_tab[,1])*100

  } # end of loop for percentage difference

  if(change == 'growthrate'){

    nyr <- length(years)
    res_tab <- data.frame(occ_it[, 1], occ_it[, ncol(occ_it)], row.names = NULL)
    colnames(res_tab) <- as.character(c(min(years), max(years)))
    res_tab$change = (((res_tab[,2]/res_tab[,1])^(1/nyr))-1)*100

  }

  # return the mean, quantiles, and the data
  return(list(mean = mean(res_tab$change),
              CIs = quantile(res_tab$change, probs = c(0.025, 0.975)),
              data = res_tab))
}
CharlieOuthwaite commented 6 years ago

Changes made and now merged into the master.