GLEON / LakeMetabolizer

Collection of Lake Metabolism Functions
16 stars 10 forks source link

option for constraining metab estimates positive #72

Closed jzwart closed 9 years ago

jzwart commented 9 years ago

For example, currently estimates of GPP can be negative, which biologically cannot happen. I suppose one could ignore these results without the option of constraining positive but it might be nice to have everything in the right direction.

To do so, I think we would just need to exponentiate the gpp and r parameters.

lawinslow commented 9 years ago

I don't think we should do that. As @jread-usgs always says, garbage-in, garbage-out. If the model fit to the data says GPP is negative, then the model does not adequately capture the mechanisms at play and therefor, is not a good estimator of metabolism. Artificially constraining it to be what we think is "valid" does a disservice. If the user, after the fact wants to filter out negative GPP values and replace them with 0.0001 mg/L/d, then that is on them.

@rBatt @pchanson, thoughts?

rBatt commented 9 years ago

I think this is a bad idea.

Constraining parameters to be positive often just deceives the user. You feel good b/c GPP isn't negative, but you're tricking yourself, b/c instead it's just hugging 0. It wants to be negative b/c the data (or the model) suck. At least when you see -GPP you know something is wrong. When you see near-0, you wonder "is this because GPP was low, or b/c I forced the parameter positive?"

The aforementioned reasoning applies b/c -GPP is statistically OK –– this is distinct from cases where variance, e.g., is negative. That can't work, so force it positive.

In some cases, you might want to force GPP positive if you think that the optimization algorithm is getting stuck in a negative local minimum. However, forcing positive doesn't even fix this problem. The solution to that problem is to re-run your optimization with a large array of combinations of starting values, or to use an optimization routine (such as simulated annealing or differential evolution) that isn't as susceptible to the problem of local minima.

That being said, if someone wants to force things positive (maybe they want to do a study on how you can be extremely mislead if you take seasonal averages of GPP if you force parameters positive, vs throwing out all -GPP in a non-forced fit), we should let them do so. We can either add ellipses in the function(...), or do a unction(optAlg=list(OptAlgFuncName, list(otherArgs2OptAlg))), whereby we allow users to specify arguments to pass to optim(), or even allow users to supply their own optimization algorithm. That would be my suggestion.

However, I do not think it is a good idea to make it overly-easy to force parameters positive.

On Tue, Nov 18, 2014 at 7:34 PM, Luke Winslow notifications@github.com wrote:

I don't think we should do that. As @jread-usgs https://github.com/jread-usgs always says, garbage-in, garbage-out. If the model fit to the data says GPP is negative, then the model does not adequately capture the mechanisms at play and therefor, is not a good estimator of metabolism. Artificially constraining it to be what we think is "valid" does a disservice. If the user, after the fact wants to filter out negative GPP values and replace them with 0.0001 mg/L/d, then that is on them.

@rBatt https://github.com/rBatt @pchanson https://github.com/pchanson, thoughts?

— Reply to this email directly or view it on GitHub https://github.com/GLEON/LakeMetabolizer/issues/72#issuecomment-63572865 .

rBatt commented 9 years ago

Also, I should point out that by giving access to the argument of optim(), you could specify method="L-BFGS-B" (that right?), then implement the box constraints with the upper and lower. That way we don't have to implement control flow in the NLL code, or have separate functions for NLL's.

Also, constraining parameters can't be done for BK or LM. It could be done w/ Bayes, but there are already 3 models. Implementing a blanket force–pos would be easy, but it'd just change those 3 models to 6 if we wanted to make it an "option".

jzwart commented 9 years ago

I like the access to optim() idea, and I didn't notice the L-BFGS_B method before but that seems like that would constrain positive or whatever the user wants. And good point, this would get around the BK and LM (and possibly Bayes) being 'left out' of the constraining option, while the user still has the option of constraining if given access to optim()

nrlottig commented 9 years ago

I have been playing around with constraining metabolism parameters and still yet to be convinced that is necessarily a bad idea, although I think some of the arguments made here are strong. Attached is a plot of metabolism estimates (y-axis) vs 14C production (x-axis: converted to mg O2/l/d) in Trout Lake (2007-2010). Red points are the constrained MLE estimates @rBatt generated for a 14C vs free-water O2 L&O methods paper, blue dots, GPP from LakeMetabolizer Bayes algorithm and green dots, constrained LakeMetabolizer Bayes algorithm. Dashed line is 1:1. First observation is that free-water yields negative GPP even on day when 14C suggests relatively high production. Second observation, when GPP is positive, the unconstrained bayes estimates are consistent with 14C estimates. Third, @rBatt MLE forced model does a pretty good job at estimating GPP assuming 14C is correct. Finally, constraining Bayes model is better than the unconstrained version but it appears as though the constrained estimates are being underestimated at high production values especially when compared to the MLE model outputs. I've modified the Bayes model to constrain the parameters by truncating the priors:

GPP ~ dnorm(mean,variance)T(0,) R ~ dnorm(mean,variance)T(,0)

14c vs all include negative

jzwart commented 9 years ago

Interesting. I'm pretty impressed at how well the free-water matches the 14C, especially given the small scale for Trout's gpp (0.05 - 0.2). What does the driver data look like for the shitty days, negative gpp? i.e. does it look like the DO cycles are driven by physical or biological processes. On the one hand, it would be nice not to throw out a bunch of metabolism days by constraining the models, but on the other hand, interpreting the results from constrained models may be a bit murky - all the results are wrong, but more 'right' than unconstrained, so do you take these as true estimates or what. Maybe it's best to incorporate estimates of uncertainty telling the user how much she/he can trust the output.

On Tue, Dec 23, 2014 at 11:50 AM, nrlottig notifications@github.com wrote:

I have been playing around with constraining metabolism parameters and still yet to be convinced that is necessarily a bad idea, although I think some of the arguments made here are strong. Attached is a plot of metabolism estimates (y-axis) vs 14C production (x-axis: converted to mg O2/l/d) in Trout Lake (2007-2010). Red points are the constrained MLE estimates @rBatt https://github.com/rBatt generated for a 14C vs free-water O2 L&O methods paper, blue dots, GPP from LakeMetabolizer Bayes algorithm and green dots, constrained LakeMetabolizer Bayes algorithm. Dashed line is 1:1. First observation is that free-water yields negative GPP even on day when 14C suggests relatively high production. Second observation, when GPP is positive, the unconstrained bayes estimates are consistent with 14C estimates. Third, @rBatt https://github.com/rBatt MLE forced model does a pretty good job at estimating GPP as! suming 14 C is correct. Finally, constraining Bayes model is better than the unconstrained version but it appears as though the constrained estimates are being underestimated at high production values especially when compared to the MLE model outputs. I've modified the Bayes model to constrain the parameters by truncating the priors:

GPP ~ dnorm(mean,variance)T(0,) R ~ dnorm(mean,variance)T(,0)

[image: 14c vs all include negative] https://cloud.githubusercontent.com/assets/5272811/5540449/61e52b40-8a91-11e4-8aa4-7e36bb6f7b7a.png

— Reply to this email directly or view it on GitHub https://github.com/GLEON/LakeMetabolizer/issues/72#issuecomment-67972249 .

Jacob A. Zwart University of Notre Dame Jones Lab & MFE 263 Galvin Life Sciences Notre Dame, IN 46556 269.370.2788

rBatt commented 9 years ago

The conversation is interesting, but I think we're getting sucked into a distraction here.

On Wed, Dec 31, 2014 at 10:53 AM, Jake Zwart notifications@github.com wrote:

Interesting. I'm pretty impressed at how well the free-water matches the 14C, especially given the small scale for Trout's gpp (0.05 - 0.2). What does the driver data look like for the shitty days, negative gpp? i.e. does it look like the DO cycles are driven by physical or biological processes. On the one hand, it would be nice not to throw out a bunch of metabolism days by constraining the models, but on the other hand, interpreting the results from constrained models may be a bit murky - all the results are wrong, but more 'right' than unconstrained, so do you take these as true estimates or what. Maybe it's best to incorporate estimates of uncertainty telling the user how much she/he can trust the output.

On Tue, Dec 23, 2014 at 11:50 AM, nrlottig notifications@github.com wrote:

I have been playing around with constraining metabolism parameters and still yet to be convinced that is necessarily a bad idea, although I think some of the arguments made here are strong. Attached is a plot of metabolism estimates (y-axis) vs 14C production (x-axis: converted to mg O2/l/d) in Trout Lake (2007-2010). Red points are the constrained MLE estimates @rBatt https://github.com/rBatt generated for a 14C vs free-water O2 L&O methods paper, blue dots, GPP from LakeMetabolizer Bayes algorithm and green dots, constrained LakeMetabolizer Bayes algorithm. Dashed line is 1:1. First observation is that free-water yields negative GPP even on day when 14C suggests relatively high production. Second observation, when GPP is positive, the unconstrained bayes estimates are consistent with 14C estimates. Third, @rBatt https://github.com/rBatt MLE forced model does a pretty good job at estimating GPP as! suming 14 C is correct. Finally, constraining Bayes model is better than the unconstrained version but it appears as though the constrained estimates are being underestimated at high production values especially when compared to the MLE model outputs. I've modified the Bayes model to constrain the parameters by truncating the priors:

GPP ~ dnorm(mean,variance)T(0,) R ~ dnorm(mean,variance)T(,0)

[image: 14c vs all include negative] < https://cloud.githubusercontent.com/assets/5272811/5540449/61e52b40-8a91-11e4-8aa4-7e36bb6f7b7a.png>

— Reply to this email directly or view it on GitHub < https://github.com/GLEON/LakeMetabolizer/issues/72#issuecomment-67972249> .

Jacob A. Zwart University of Notre Dame Jones Lab & MFE 263 Galvin Life Sciences Notre Dame, IN 46556 269.370.2788

— Reply to this email directly or view it on GitHub https://github.com/GLEON/LakeMetabolizer/issues/72#issuecomment-68449879 .

lawinslow commented 9 years ago

I'm going to table this for now. The option to constrain estimates across the models is something we could build in. But it will require a champion to do it across the models (much like a non-linear Productivity-Irradiance relationship).