PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
203 stars 235 forks source link

I want to use a parameter prior without performing a meta-analysis. #80

Closed braczka closed 10 years ago

braczka commented 10 years ago

Is there a way to include a parameter in the PEcAn sensitivity analysis, but to 'turn off' the meta-analysis. In other words is there a way to use the parameter prior from the database directly and not automatically query trait data if some exists.... other than removing the trait data itself? Can this be parameter specific (i.e. still perform meta-analysis for other parameters?)

dlebauer commented 10 years ago

Yes, it shouldn't be too hard to add this to the meta analysis module. What exactly is the rationale? It would be useful to know a few use cases before deciding if and how to add the feature. For example, could the same analysis be done with a cloned PFT and a strong prior? Have you looked to see what would need to be done? Any thoughts on how it should be implemented (where should the trait or traits be stated?) If it is a unique case, the code could just be added to the workflow script.

braczka commented 10 years ago

Yes, these are good questions. Sorry i didn't add more detail in the initial question. I have found that for my simulation (100 year run at willow creek with ED2) that the parameter values for quantum efficiency and dark respiration factor do not fall within the distribution provided by BETY (using meta-analysis but with random effects turned off). Therefore i want to adjust the posterior distribution that is fed into the sensitivity analysis that is consistent with parameter values that work for my run. There is a couple ways I have tried to do this:

1) Performing the meta-analysis with random effects turned ON, has the effect of widening the posterior distribution considerably and the parameter values that work for my simulation then fall within the posterior. The problem with this approach is that the posterior distribution is VERY broad, and also the number of trait data may not be sufficient to estimate the across site variance and treatment variance. 2) As you suggest in the comment, another approach would be to change the parameter prior to adjust the posterior. I have tried this but I have found two issues: 1) If I make a small change to the prior (shift the distribution to the left/right) then it has little effect on the final posterior (i.e. the trait data seems to dominate the final distribution) 2) if I make a larger change to the prior (shift the distribution to left/right) then the R code crashes and gives an error that says something like: "prior and data are not consistent" and then halts the meta-analysis.

In the end, it would be useful to me to be able to implement a new prior of my choosing, but not subject to the meta-analysis. Another quick hack would be to temporarily remove the trait data from the database so that no meta-analysis was performed at all, but that is time consuming. I am simply suggesting a flag that could turn the meta-analysis ON/OFF.

dlebauer commented 10 years ago

Regarding the feature request:

I do not think we would want to add a feature that would allow a user to ignore the fact that data indicate a model parameter is invalid. The standard workaround for this is to fix the parameter in the settings file and remove the prior. I can not think of a good reason to run a variance decomposition for a prior that ignores data from a subset of parameters.

I think allowing users to turn off the meta-analysis is sensible, because it would allow users to compare the analysis before and after data as I did in the Ecological Monographs paper. This can be done by cloning a pft and removing all associated species, and should not be too difficult to implement as a feature within the R code.

Regarding the specific issue:

The fact that you are getting nonsensical results from a robust PFT / site combination indicates that some science can be done. The first thing I would do is debug using some or all of the following steps.

  1. compare your results with the runs from the DOE cross-site synthesis paper.
  2. plot the data by site and treatment to see if there are any outliers;
  3. compare these data with data from other species / pfts.
  4. compare the data against the original references to make sure that they are valid. If they are not, you can flag them in BETYdb (or let me know which ones to flag) for qaqc. I know that there have been some issues with quantum efficiency, but these should be resolved in the most recent dumps of the database.
  5. If the data are valid, and the meta-analysis posterior is consistent with the data, then it may be that there is a mismatch between the model's definition of the parameter and the data, or in the assumptions encoded in the convert.samples.ED function (This is where variables from BETYdb are converted to model-specific scales (e.g. check how 'dark respiration factor' is calculated from leaf respiration rate). I would check these.

Have you done any of these yet? Please update the issue with what you found (including figures or links to figures that you create; you can post the code as a 'gist' at gist.github.com and link too it.)

braczka commented 10 years ago

@dlebauer, @mdietze, @serbinsh,

I have a good idea now of how I would go about 'turning off the meta-analysis'. I think your suggestion of cloning the pft and then turning off the species to prevent querying trait data. This is much preferred to eliminating trait data.

To your second point, I don't know if there is anything necessarily to debug from the trait data, although I haven't updated my database in a while, so if there was a lot of new trait data added recently I would not have that. Take quantum efficiency, for example. The median value from the meta-analysis was roughly 0.045. I found that I needed a value closer to 0.06 for my simulation. The trait data is the following:

        Site_ID:     557     557       557     621      621     622    624       624     557      557     557     621     621
Treatment_ID:      NA      NA       NA    1307    1307   1308   1459     1460   1075    1075      NA    1307   1307
   Species_ID:       32       32        32       32       32       32       32        32       32       48      546     546     546 
           Mean: 0.0630 0.0120 0.0230 0.0300 0.0240 0.1500 0.0380 0.0380 0.0660 0.0830 0.0426 0.0490 0.0410

There are some values that encompass the 0.06 value I am looking for, but only on the periphery.

dlebauer commented 10 years ago

@braczka if you "need" a value of 0.06, and don't want to consider why the model would require such a value to produce sensible output, then the parameter becomes a 'tuning parameter'. These can be specified in the constants tag of the settings file (see the wiki documentation for the PEcAn settings files.

Of course, it is possible that the issue could be resolved by enabling the meta-analysis to do a better job of accounting for the uneven representation of species within the PFT: in your example, most data come from the species with id = 32, but that does not necessarily mean that species 32 is the most abundant representative of the PFT in the ecosystem (in fact it is possible that there is no data from the most abundant species, and lots of data from species that don't actually occur at the site).

It is also notable that the meta-analysis computes global rather than site level parameterizations. This would be a useful feature to add (I can not find an existing issue for this although I thought it was on redmine).

note: to print a dataframe that is easy to read in markdown, try something like pander(mydataframe) (requires that you install the pander package)

mdietze commented 10 years ago

David, I don't think Brett want's to tune the parameter to a specific value, I think he want to just use the prior and ignore the data.

Brett, if you want to just run with priors for everything then you could just skip the meta-analysis step all together, though you'll want to run the get.trait.data since that also gets the priors (we should think about separating those two).

If you want to run with MA for some traits, but priors for others, then you could run get.trait.data, then load the data file and delete the data for specific traits, then resave it and run the MA. You could probably do that in just a few lines of code.

braczka commented 10 years ago

Yes, that's right, as Mike says I don't want to fix or tune the parameter, I still want a distribution to test for sensitivity, just to shift the posterior to a region that works for me. .

You both have given me ideas that can work to eliminate the meta-analysis: , either clone a PFT and remove species or edit the trait.data file to prevent meta-analysis for certain parameters. Thank you.

serbinsh commented 10 years ago

Brett,

Let me know how you get along with this analysis using the priors only. I am interested in this as a way to illustrate the change in model runs with and without data constraints.

braczka commented 10 years ago

Shawn,

I will keep you updated. If I can get reasonable results with the dark_respiration_fraction edit (issue #83), I will be using the full meta-analysis posterior. Thanks!

dlebauer commented 10 years ago

@braczka that is why I encouraged you to debug before trying to figure out how to get around using data. In most cases when a model parameterization produces nonsensical results, there is either an error in the data or in the code. In this case, you found the bug I suspected in the 5th step of my previous comment, so I am glad you found it and the problem can be resolved by fixing write.configs.ED (#83) rather than ignoring data because it doesn't produce realistic results.

braczka commented 10 years ago

@dlebauer Do you think that was a coincidence? ;-) Your comment motivated me to take another look at the conversion, so thank you.

On a side note, it wasn't as if I wanted to ignore data. My analysis becomes stronger when including extra trait data otherwise it is just based on priors. In the end I am faced with the reality of what the model simulation gives me and how I can make it provide reasonable results. There are a lot of working parts in my simulation that are likely different than what have been used before, including my meteorology and the time frame I am looking at (100 years). I was attempting to shift the parameter priors only as a 'last resort' to get something useable.