GLEON / LakeMetabolizer

Collection of Lake Metabolism Functions
17 stars 10 forks source link

Code for plotting modeled vs. observed O2 #126

Open jzwart opened 7 years ago

jzwart commented 7 years ago

Create code that will take fitted parameters from metabolism model and plot modeled vs. observed oxygen data for each metabolism day.

rBatt commented 7 years ago

Good idea. I have a couple comments that might help in the long-run, but I fear will sound like a pain now. Basically, two points:

  1. the concept of 'predicted values' isn't entirely consistent across our models
  2. getting the predicted values requires different approaches for different models

1. Concept of Predicted Values

It is important to distinguish between models that have oxygen concentration as the response variable, and those with the first difference in oxygen as the response. For example, metab.ols (and metab.bookkeep) has delta-do as the response variable, whereas metab.mle actually predicts [DO].

2. Getting Predicted Values

Kalman predictions

The Kalman smoother should be used to get predicted values from the Kalman filter model (this is already included; see "smoothDO" in "Value" in ?metab.kalman).

Bayesian predictions

For the Bayesian method, you'd want to adjust the R code that calls JAGS, telling it to track the "latent" state variable ('true' DO) at each time step as though it were a parameter.

Basically, the model distinguishes between "true" and "observed" DO values. The "true" values obviously are not observed, and are referred to as "latent" variables --- essentially parameters. The model already estimates these latent variables, but they are not returned to the user unless they are designated as "parameters to track" from within JAGS. (If X is the variable assigned to the true DO values, you just need to tell JAGS to track "X"; no need to specify `c('X[1]', 'X[2]', 'X[3]', ...).)

'Tracking' these parameters has no effect of parameter estimation --- the true DO values are already estimated, all you're doing is 'saving' the posterior samples for each DO value. That's right, true DO at each time step will have its own posterior distribution. Therefore, I recommend using a N x T matrix to store the true DO output (N = number of posterior iterations, T = number of time steps). If we combine output for many days (e.g., in metab()), it could be formatted as an array (the size of the 3rd dimension would be the number of days).

OLS predictions

This should be pretty easy, as an lm() object already has predicted values as a stored element (do we give users the lm() object in the output? I forget).

However, metab.ols predicts diff(DO), not DO. So we'd have to do cumsum(predicted(metab.ols(...))) + DO[1]. The DO[1] is oxygen at the first time step (which we might just say is equal to do.obs[1], the observed oxygen at t=1).

An alternative is, for the plot of observed DO, to use do.obs - do.obs[1], and then to not add DO[1] to the cumulative sum of metab.ols's predicted values. Another alternative is to use diff(do.obs) as the observed oxygen, and for the predicted just use predicted(metab.ols(...)).

MLE predictions

I think the predictions for metab.mle might be more straightforward than for other model types. I think this because 1) mle predicts concentration; 2) mle doesn't have the complications of Bayesian or Kalman approaches that result from simultaneously considering observation and process error.

I think all we have to do is copy the code from the inner loops of these models (PE and OE models), and just plug in the fitted parameter values where needed.

BK predictions

I don't think we can generate predictions for this model --- it's not a statistical model.

Note

As a general note: A model that makes good predictions of oxygen isn't necessarily making good estimates of metabolism (as extreme example, get predicted values using forecast::auto.arima() or stats::arima(); fits great, won't tell you much about metabolism), though I agree this is useful to look at. When it's time to write the help page for this code, it might be worth pointing out that accurately predicting O2 is different than accurately predicting any component of metabolism.