PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 235 forks source link

Appropriate estimation of prior for pft subsets (e.g. species, cultivar) #279

Open dlebauer opened 9 years ago

dlebauer commented 9 years ago

After doing a meta-analysis for all plants in the genus salix, I want to use the posteriors to calculate a reasonable priors for particular species within the genus.

I have been (naiively) using the distributions in post.distns.Rdata, though it is now apparent that these are posterior estimates of the mean value of the parameter, excluding the variance terms (sd.y, sd.trt, sd.site), and are thus might be appropriate for priors on the same pft, but not appropriate for priors on a pft that represents a subset of species in the more general pft.

Now, the question is - how to compute priors for the lower (species) level pft.from the higher (genus) level pft.

Here is an example using an existing mcmc.list object "SLA" - see this gist for the settings file and PEcAn code used to generate it.

library(fitditrplus)
download.file(url = "https://dl.dropboxusercontent.com/u/18092793/SLA-test.Rdata",
                    dest = "test.Rdata")
load("test.Rdata")

Default option: priors based on beta.o posterior

fitdist(SLA[,'beta.o'], distr = "norm")
## N(15, 1.6)

I am afraid that this under-estimates the variance, as there is no apriori reason to expect a species-level mean to be within the bounds of the genus-level mean (beta.o)

option 1

One option would be to use, e.g., N(beta.o, sd.y + sd.site + sd.trt). This would be straightforward except that the normal distribution can have support < 0.

However, this approach seems to overestimate the variance

stats <- summary(trait.mcmc$SLA)$statistics
mu <- stats['beta.o','Mean']
sd <- sum(stats[grepl('sd', rownames(stats)),'Mean']) 
## N(15.9, 15.7)

option 2

Another option seems to be that we could combine the samples from different parameters in the meta-analysis model (e.g. prior <- beta.o + beta.ghs + beta.trt + beta.site)

fitdist(rowSums(SLA[,grepl("beta", colnames(SLA))]), dist = "norm")
## N(1.6, 23)

However, this approach does not appear to match the mean or the SD.

Thoughts?

mdietze commented 9 years ago

Option 1 is the closest to correct, however you don't want to include the treatment effect since you're not imposing a treatment on your new species. Technically you should also be drawing a random site effect from the site distribution and then adding that in with beta.o to estimate the mean, rather than summing the standard deviations. These two things (sd.trt and summing the sd) both contribute to making the variance higher than what it actually is.

That said, the model you REALLY should be fitting if you want to predict a new subspecies is one that has a subspecies random effect rather than a site random effect, since what you're wanting to do is to borrow strength across the species to make inference about one subspecies.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.