BiologicalRecordsCentre / BRCindicators

An R package for creating indicators from trends data
4 stars 11 forks source link

trend_assessment for bma method #72

Open 03rcooke opened 3 years ago

03rcooke commented 3 years ago

Is there an easy/obvious way to adapt the code to allow trend_assessmentto use the outputs from either the lambda method or the bma method?

This would be really useful for the plots created in wrappeR.

@AugustT

AugustT commented 3 years ago

The trend assessment function is doing comparison between year values for species and the indicator line.

The worker functions are here https://github.com/BiologicalRecordsCentre/BRCindicators/blob/58f9591f0d27405dae2a46c26d0980a495700cd1/R/misc_functions.r#L421 and here https://github.com/BiologicalRecordsCentre/BRCindicators/blob/58f9591f0d27405dae2a46c26d0980a495700cd1/R/misc_functions.r#L491

For the species it looks at the average change across years and and puts these into categories. For the indicator it simply looks to see if there is a significant change between the first and last year.

So these functions are really quite basic, and i suspect you could adapt them to also be able to take in the BMA data. bma does not return species level information, but what it does return looks like it could quite easily go into the indicator_assessment function with some minor tweaks.

drnickisaac commented 3 years ago

@03rcooke If you'd prepare a toy example (using one of the datasets within BRCindicators) I'd be happy to help.

03rcooke commented 3 years ago

Thanks for the information @AugustT

@drnickisaac Here's a toy example, I can't figure out how we can assess them without the species assessments provided by lambda_indicator:

# number of species
nsp = 50
# number of years
nyr = 30
# number of iterations
iter = 100
# create an array
dat_lamb <- array(data = rnorm(n = nsp*nyr*iter,
                               mean = 0.5,
                               sd = 0.1),
                  dim = c(nsp, nyr, iter),
                  dimnames = list(paste0('SP',1:nsp),
                                  1:nyr,
                                  1:iter))
# ensure values are bounded by 0 and 1
dat_lamb[dat_lamb > 1] <- 1
dat_lamb[dat_lamb < 0] <- 0

# run lambda indicator
lamb_ind <- BRCindicators::lambda_indicator(dat_lamb)

# run short-term trend assessment
st <- BRCindicators::trend_assessment(lamb_ind, start_year = (max(lamb_ind$summary$year) - 6), end_year = max(lamb_ind$summary$year)) 

# prepare data for bma indicator
dat_bma <- dat_lamb %>% 
  # collapse matrix
  reshape2::melt() %>% 
  dplyr::rename(species = Var1, year = Var2, iter = Var3, occ = value) %>% 
  dplyr::mutate(occ = log(occ + 0.00001))

# means
dat_bma_mean <- dat_bma %>% 
  dplyr::group_by(species, year) %>% 
  # calculate mean
  dplyr::summarise(index = mean(occ))

# sd
dat_bma_sd <- dat_bma %>% 
  dplyr::group_by(species, year) %>% 
  # calculate mean
  dplyr::summarise(se = sd(occ))

# combine means and sd
dat_bma_comb <- dplyr::left_join(dat_bma_mean, dat_bma_sd, by = c("species", "year"))

# run bma indicator
bma_ind <- BRCindicators::bma(dat_bma_comb) 

# HOW to do a trend assessment?
AugustT commented 3 years ago

@03rcooke

# HOW to do a trend assessment?
colnames(bma_ind) <- c('year', 'indicator', 'lower', 'upper', # these columns are used in the indicator function below
                      'Index.M', 'lowerCI.M', 'upperCI.M')

# this function normally runs inside the trend_assessment function
assessment <- BRCindicators:::indicator_assessment(summary_table = bma_ind, 
                                                   start_year = max(bma_ind$year), 
                                                   end_year = min(bma_ind$year))
assessment
drnickisaac commented 3 years ago

Looks good to me. Does this fix it @03rcooke ?

AugustT commented 3 years ago

I think @03rcooke also wants the species level assessment, but species level data is not returned by bma. @drnickisaac could bla return species level information?

03rcooke commented 3 years ago

Thanks @AugustT! Yep @drnickisaac, as Tom says this gives an assessment of the indicator, but I think Rob Boyd also wants the species assessments for the England indicator. This would be adapting BRCindicators:::species_assessment() I think, but bma() does not produce species-level information at the moment that would be needed to assess trends at the species-level.

drnickisaac commented 3 years ago

Extract mean growth rates on the log scale mean_spgrowth <- colMeans(attr(bma_ind, "model")$mean$spgrowth) Convert to % change 100*(exp(mean_spgrowth)-1) I think this is right, but please check. A good test would be to compare like-for-like growth rates of lambda and bma: I would expect the results to be very close to 1:1.

AugustT commented 3 years ago

Ah, it was in the model, nice.

Also note that in my example I used the M Prime stats. I'm not sure of the difference between M and M` so you might want to check that is the one to use.

drnickisaac commented 3 years ago

Yes, use Mprime. There's a paragraph in discussion of Freeman et al about pros and cons.

On Tue, 12 Jan 2021 at 16:06, Dr Tom August notifications@github.com wrote:

Ah, it was in the model, nice.

Also note that in my example I used the M Prime stats. I'm not sure of the difference between M and M` so you might want to check that is the one to use.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BiologicalRecordsCentre/BRCindicators/issues/72#issuecomment-758759742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6PTKHGMRL6372DY3FPV5DSZRXQBANCNFSM4VVUQZEA .

-- http://drnickisaac.weebly.com/ http://www.ceh.ac.uk/staff/nick-isaac http://www.ceh.ac.uk/staff/nick-isaac @drnickisaac

http://www.britishecologicalsociety.org/getting_involved/special_interest_groups/Macroecology.php

03rcooke commented 3 years ago

I still can't quite understand how to do the species assessments for the bma method @drnickisaac. Here's an example, the values don't seem to be similar with the lambda method and the bma method:

# number of species
nsp = 50
# number of years
nyr = 30
# number of iterations
iter = 100
# create an array
dat_lamb <- array(data = rnorm(n = nsp*nyr*iter,
                               mean = 0.5,
                               sd = 0.1),
                  dim = c(nsp, nyr, iter),
                  dimnames = list(paste0('SP',1:nsp),
                                  1:nyr,
                                  1:iter))
# ensure values are bounded by 0 and 1
dat_lamb[dat_lamb > 1] <- 1
dat_lamb[dat_lamb < 0] <- 0

# run lambda indicator
lamb_ind <- BRCindicators::lambda_indicator(dat_lamb)

# run long-term trend assessment
 lt <- BRCindicators::trend_assessment(lamb_ind, start_year = 1,  end_year = max(lamb_ind$summary$year)) 

# prepare data for bma indicator
dat_bma <- dat_lamb %>% 
  # collapse matrix
  reshape2::melt() %>% 
  dplyr::rename(species = Var1, year = Var2, iter = Var3, occ = value) %>% 
  dplyr::mutate(occ = log(occ + 0.00001))

# means
dat_bma_mean <- dat_bma %>% 
  dplyr::group_by(species, year) %>% 
  # calculate mean
  dplyr::summarise(index = mean(occ))

# sd
dat_bma_sd <- dat_bma %>% 
  dplyr::group_by(species, year) %>% 
  # calculate mean
  dplyr::summarise(se = sd(occ))

# combine means and sd
dat_bma_comb <- dplyr::left_join(dat_bma_mean, dat_bma_sd, by = c("species", "year"))

# run bma indicator
bma_ind <- BRCindicators::bma(dat_bma_comb, plot = FALSE) 

# species trend assessment for bma - FROM HERE IS THE PROVISIONAL CODE
colnames(bma_ind) <- c("year", "indicator", "lower", "upper", "Index.M", "lowerCI.M", "upperCI.M")

# mean growth rate per year and species?
spgrowth <- attr(bma_ind, "model")$mean$spgrowth

# mean growth rate across years per species
mean_spgrowth <- rowMeans(spgrowth)

# back-transform
spg_pcpy <- 100*(exp(mean_spgrowth)-1)

# compare results
head(as.data.frame(spg_pcpy))
head(lt[[1]])
drnickisaac commented 3 years ago

Rob, can yoy point me towards a datalab project with a working example?

On Tue, 2 Mar 2021, 16:55 Dr Rob Cooke, notifications@github.com wrote:

I still can't quite understand how to do the species assessments for the bma method @drnickisaac https://github.com/drnickisaac. Here's an example, the values don't seem to be similar with the lambda method and the bma method:

number of species

nsp = 50

number of years

nyr = 30

number of iterations

iter = 100

create an array

dat_lamb <- array(data = rnorm(n = nspnyriter, mean = 0.5, sd = 0.1), dim = c(nsp, nyr, iter), dimnames = list(paste0('SP',1:nsp), 1:nyr, 1:iter))

ensure values are bounded by 0 and 1

dat_lamb[dat_lamb > 1] <- 1 dat_lamb[dat_lamb < 0] <- 0

run lambda indicator

lamb_ind <- BRCindicators::lambda_indicator(dat_lamb)

run long-term trend assessment

lt <- BRCindicators::trend_assessment(lamb_ind, start_year = 1, end_year = max(lamb_ind$summary$year))

prepare data for bma indicator

dat_bma <- dat_lamb %>%

collapse matrix

reshape2::melt() %>% dplyr::rename(species = Var1, year = Var2, iter = Var3, occ = value) %>% dplyr::mutate(occ = log(occ + 0.00001))

means

dat_bma_mean <- dat_bma %>% dplyr::group_by(species, year) %>%

calculate mean

dplyr::summarise(index = mean(occ))

sd

dat_bma_sd <- dat_bma %>% dplyr::group_by(species, year) %>%

calculate mean

dplyr::summarise(se = sd(occ))

combine means and sd

dat_bma_comb <- dplyr::left_join(dat_bma_mean, dat_bma_sd, by = c("species", "year"))

run bma indicator

bma_ind <- BRCindicators::bma(dat_bma_comb, plot = FALSE)

species trend assessment for bma - FROM HERE IS THE PROVISIONAL CODE

colnames(bma_ind) <- c("year", "indicator", "lower", "upper", "Index.M", "lowerCI.M", "upperCI.M")

mean growth rate per year and species?

spgrowth <- attr(bma_ind, "model")$mean$spgrowth

mean growth rate across years per species

mean_spgrowth <- rowMeans(spgrowth)

back-transform

spg_pcpy <- 100*(exp(mean_spgrowth)-1)

compare results

head(as.data.frame(spg_pcpy)) head(lt[[1]])

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BiologicalRecordsCentre/BRCindicators/issues/72#issuecomment-789054605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6PTKFKU5CE43Q7CHC5I6TTBUKBRANCNFSM4VVUQZEA .

03rcooke commented 3 years ago

Recompile project - Recompile has the toy set up in the environment @drnickisaac

drnickisaac commented 3 years ago

Thanks @03rcooke. I have explored a little. First I double-checked the code. Everything you've done looks good. Then I plotted the numbers against one another: image This shows that bma estimates smaller values than lambda, but the correlation between them is reasonably tight. This is not entirely unexpected, because bma includes smoothing. So the second thing I did was plot the data going into bma against the modelled index coming out, converting back to the occupancy scale: image This shows that the blue fitted line (on which the spgrowth is calculated) is much smoother than the red observed data, which is what goes into the lambda method. When there are no missing data (as here), the species growth rates from the lambda method are simply the differences between first and last years: it's not hard to see that these are likely to be larger on average than the trend in the smoothed blue line. I suspect that part of the differences observed here are attributable to the way you've coded the simulation: these species have no net trend in occupancy but experience substantial variation between years. I would expect the two methods to be more similar with real data. It would be easy to check using the code at the bottom of "Untitled 1" script in the Recompile project.

drnickisaac commented 3 years ago

Two further important caveats: 1) the lambda method estimates growth rates on the logit scale, whereas bma uses log scale. This is likely to make very little difference in the simulated data above, where all species have occupancy close to 0.5. However, for real data many species have extremely low values of occupancy, so the growth rates on the logit scale tend to be stretched somewhat. I've taken advice from two professional statisticians on whether the log or logit occupancy is preferable, and both agreed that there is no a priori best answer because the indicator is a heuristic. But it could be an important source of difference between the methods (that we should document!). 2) My comparison assumes that species come out of bma in the same order as the names that went in. However, it's possible that they get reordered as c(SP1, SP10, SP11, ...., SP2, SP20, SP21 ...). I have not checked this, but the close alignment of red and blue lines in the figure above implies that my assumption is sound.

drnickisaac commented 3 years ago

If you wanted to check this in more detail, there's a couple of things you could do: 1) rerun bma() with logit(occupancy) (don't forget to set m.scale = "logit"). The match with lambda should be closer. 2) try a simulated dataset in which occupancy varies more among species but less between years. There is a 'simulate_indicator()` function in BRCindicators that will do this for you. Unhelpfully I didn't write a help file for it, but the code is quite self-explanatory.