Conte-Ecology / conteStreamTemperature

Package for cleaning and analyzing stream daily stream temperature
MIT License
1 stars 1 forks source link

Predictions with autoregressive across daymet record #16

Closed djhocking closed 9 years ago

djhocking commented 9 years ago

Right now when there is data the code does predictions using the autoregressive function. However, for predicting across the entire daymet record the code just predicts the trend as if there was no error (predicts without autoregressive on the residuals because there are no residuals). However, for some dates and sites across that daymet record there are data so we could use the more accurate autroregressive.

To do this however we have to either adjust the firstObsRows and evalRows functions, the data prep functions to combine the daymet data with the observed data, or use an ifelse() statement in the predictTemp function to do one thing if the date-site is in the observed record and something else if not.

bletcher commented 9 years ago

The newest versions of the ...Rows functions should work for what you are describing. Earlier version had an error. Call if it would help to talk this over

On Thu, Oct 30, 2014 at 11:01 PM, Daniel J. Hocking < notifications@github.com> wrote:

Right now when there is data the code does predictions using the autoregressive function. However, for predicting across the entire daymet record the code just predicts the trend as if there was no error (predicts without autoregressive on the residuals because there are no residuals). However, for some dates and sites across that daymet record their are data so we could use the more accurate autroregressive.

To do this however we have to either adjust the firstObsRows and evalRows functions, the data prep functions to combine the daymet data with the observed data, or use an ifelse() statement in the predictTemp function to do one thing if the date-site is in the observed record and something else if not.

— Reply to this email directly or view it on GitHub https://github.com/Conte-Ecology/conteStreamTemperature/issues/16.

Silvio O. Conte Anadromous Fish Research Center, U.S. Geological Survey P.O. Box 796 -- One Migratory Way Turners Falls, MA 01376 (413) 863-3803 Cell: (413) 522-9417 FAX (413) 863-9810

ben_letcher@usgs.gov bletcher@eco.umass.edu http://www.lsc.usgs.gov/?q=cafb-research

djhocking commented 9 years ago

I have the most up to date version of the functions. The problem is that there is no temp column in the daymet data because there are no observations. So when I run

createEvalRows <- function(data) {
  #data$rowNum <- 1:dim(data)[1]
  evalRows <- data %>%
    group_by(deployID) %>%
    filter(date != min(date) & !is.na(temp)) %>%
    select(rowNum)

  return( evalRows$rowNum ) # this can be a list or 1 dataframe with different columns. can't be df - diff # of rows
}

dplyr can't look in the temp column because it doesn't exist. I could add a temp column and fill it with NA. That would make every row a firstObsRow. That would be okay if we wanted to predict the trend not accounting for the correlation in the residuals with the AR1 coefficient. This makes sense when there are no observations because then there are no residuals. I see two problems with this approach:

  1. In the JAGS model as it currently stands there would likely be a problem if nEvalRows was 0
  2. We actually want to have the best predictions when there is data, so we want to use the AR1.

I'm not sure yet the best approach. Calculate both then join them, keeping the observed predictions when available or merging the observed and daymet before calculating the firstObsRows and evalRows and doing the predictions. I think the latter is probably best but I'm not sure the best way to do this. It will also likely have to get done in small chunks because I ran out of memory and crashed my laptop when trying to doing the daily predictions over the daymet range for just the observed sites in MA.

djhocking commented 9 years ago

I may have just found an easy solution using options in Kyle's readStreamTempData function:

    covariateData <- readStreamTempData(timeSeries=FALSE, covariates=TRUE, dataSourceList=dataSource, fieldListTS=fields, fieldListCD='ALL', directory=dataInDir)

    observedData <- readStreamTempData(timeSeries=TRUE, covariates=FALSE, dataSourceList=dataSource, fieldListTS=fields, fieldListCD='ALL', directory=dataInDir)

    climateData$site <- as.character(climateData$site)
    tempData <- left_join(climateData, select(covariateData, -Latitude, -Longitude), by=c('site'))
    tempData <- left_join(tempData, select(observedData, agency, data, AgencyID, site, temp), by = c("site", "date"))
    tempDataBP <- left_join(tempData, springFallBPs, by=c('site', 'year'))

The idea is that I can get the site covariate (landscape) data separately from the observed temperature data, then join them independently to the climate data. Observed temp is NA for most of the records which should work appropriately with the firstObsRows and evalRows functions.