BiologicalRecordsCentre / BRCindicators

An R package for creating indicators from trends data
4 stars 11 forks source link

error in year 1 for bma indicator #73

Open 03rcooke opened 3 years ago

03rcooke commented 3 years ago

I don't fully understand the parameters of the bma indicator, but as far as I can tell all combinations of the parameters produce zero error in year 1 for this test data.

Which is strange as sometimes I seem to produce non-zero error in year 1 with actual data (when I filter the data to 1994-2020, but not when I use 1970??)

Reproducible example:

nsp = 50 # number of species
nyr = 30 # number of years
iter = 100 # number of iterations
# create an array
dat_lamb <- array(data = rnorm(n = nsp*nyr*iter,
                               mean = 0.5,
                               sd = 0.1),
                  dim = c(nsp, nyr, iter),
                  dimnames = list(paste0('SP',1:nsp),
                                  1:nyr,
                                  1:iter))
# ensure values are bounded by 0 and 1
dat_lamb[dat_lamb > 1] <- 1
dat_lamb[dat_lamb < 0] <- 0

# prepare data for bma indicator
dat_bma <- dat_lamb %>% 
  # collapse matrix
  reshape2::melt() %>% 
  dplyr::rename(species = Var1, year = Var2, iter = Var3, occ = value) %>% 
  dplyr::mutate(occ = log(occ + 0.0001))

# means
dat_bma_mean <- dat_bma %>% 
  dplyr::group_by(species, year) %>% 
  # calculate mean
  dplyr::summarise(index = mean(occ))

# sd
dat_bma_sd <- dat_bma %>% 
  dplyr::group_by(species, year) %>% 
  # calculate mean
  dplyr::summarise(se = sd(occ))

# combine means and sd
dat_bma_comb <- dplyr::left_join(dat_bma_mean, dat_bma_sd, by = c("species", "year"))

# run bma indicator
# this is the parameter specification that I've been using
# but try multiple options and see what it produces - I can't seem to control the output
bma_ind <- BRCindicators::bma(dat_bma_comb, seFromData = TRUE, plot = FALSE, model = "smooth", m.scale = "loge", errorY1 = TRUE, Y1perfect = FALSE, num.knots = 7, n.iter = 20000, parallel = TRUE) 

# zero error in year 1
head(bma_ind)

p <- ggplot(data = bma_ind, aes(x = Year, y = Index.Mprime)) +
  geom_hline(yintercept = 100, lty = 2, lwd = 1.5, colour = "grey") +
  geom_ribbon(aes(ymin = lowerCI.Mprime, ymax = upperCI.Mprime), fill = "grey80") +
  geom_line() +
  geom_point() +
  theme_linedraw()

p

@drnickisaac

drnickisaac commented 3 years ago

I've been looking into this and I think I can explain. The reason is that the whole model is based on estimating growth rates between adjacent years. This means that any uncertainty in the input data for year 1 gets propagated into the growth rate between years 1 and 2, not the index values. NB M.prime​ will have uncertainty in year 1 if there are species with missing data for that year, which explains why you get uncertainty for a baseline of 1970 but not 1994. Is this something that needs further documentation?