Open aappling-usgs opened 7 years ago
we could alternatively rewrite aggregateSolute to accept a model rather than predictions and to only include confidence intervals when it can do it well, by wrapping rloadest functionality or using approaches for interpolation/composite that embrace autocorrelation of errors better
Basic challenge: to produce a monthly or annual estimate, we need more information than just the predictions. But the current structure of the package separates instantaneous/unit predictions from aggregation in such a way that the information isn't available when we need it:
predictSolute
currently takes in a model and returns predictions at the temporal resolution of the discharge data (instantaneous/unit resolution).aggregateSolute
currently takes in a set of instantaneous predictions and returns monthly, annual, etc. predictions. But it doesn't take in a model object and therefore doesn't actually have enough information to do its job.This mismatch exists because I didn't understand the uncertainty propagation problem completely enough 2 years ago. @wdwatkins, you and I have explored aspects of this problem since then. It's a different problem for each model type; I've backlogged the GitHub issues for fixing this problem for composite and interpolation and lm() models, but it's immediately fixable for loadReg2 models, and this issue is about restructuring a bit so that our fix for loadReg2 also paves the way for eventual fixes for the other model types.
I've proposed two possible solutions above but am leaning toward the first, which is consistent with the title of this issue: let's modify predictSolute
to accept an argument that specifies the temporal resolution of interest and to return predictions at that resolution.
agg.by
in the current aggregateSolute
function.predictSolute
should translate the contents of the agg.by
argument into something that can be passed to rloadest::predLoad
to request load and uncertainty estimates at the desired temporal scale. rloadest::predLoad
does uncertainty propagation well for all scales, as long as you ask it specifically for the scale you want.predictSolute
should always return load or concentration estimates, but should only return numeric uncertainty estimates if the resolution is instantaneous (otherwise return NAs instead). Use the same warning language currently in place within aggregateSolute
to explain why we're returning NAs instead of uncertainties when they're requested for aggregate estimates from these models. I hope we can add uncertainty estimates to some of these models later, but those are hard challenges for another day. This structural change will take a little tiny bite out of those challenges, and that's all I want for now.aggregateSolute
function here. It's possible that we can keep some of the code in place and use aggregateSolute
as a helper to the revised predictSolute
function, e.g., for aggregating loads (but not uncertainties) for loadLm, loadInterp, and loadComp models. But I think we should be thinking of predictSolute
as the new go-to function for prediction at all temporal resolutions, such that aggregateSolute
should no longer be needed by external users.rloadest
for loadReg2 modelsSo it seems the mean water year
and mean calendar year
options will need to be eliminated, since those can't go into rloadest::predLoad
? Or can we still incorporate those two options afterwords?
Hmm, yep, those are harder. For loadflexBatch we restricted the data to complete years and then used predLoad('total')
- do you agree that approach is about as rigorous as we could hope for? The loadflexBatch code is at https://github.com/USGS-R/loadflexBatch/blob/master/batchHelperFunctions.R#L268. If it sounds like a good long-term solution to you, then the next question is how hard it would be to add that logic to predictSolute.loadReg2
- what do you think?
Mm yeah I forgot that accomplishes the same thing. That should be doable, we might be able to just pull that code into a loadflex function so it stays in one place.
see also #174, which we resolved for batch mode but would like to correct more systematically throughout the package.
predictions need to happen across the full time period[s] of interest because they need to accommodate correlation among errors in estimates due to parameter uncertainty.