expand predictSolute to predict by time period

aappling-usgs commented 7 years ago

see also #174, which we resolved for batch mode but would like to correct more systematically throughout the package.

predictions need to happen across the full time period[s] of interest because they need to accommodate correlation among errors in estimates due to parameter uncertainty.

aappling-usgs commented 7 years ago

we could alternatively rewrite aggregateSolute to accept a model rather than predictions and to only include confidence intervals when it can do it well, by wrapping rloadest functionality or using approaches for interpolation/composite that embrace autocorrelation of errors better

aappling-usgs commented 7 years ago

Basic challenge: to produce a monthly or annual estimate, we need more information than just the predictions. But the current structure of the package separates instantaneous/unit predictions from aggregation in such a way that the information isn't available when we need it:

predictSolute currently takes in a model and returns predictions at the temporal resolution of the discharge data (instantaneous/unit resolution).
aggregateSolute currently takes in a set of instantaneous predictions and returns monthly, annual, etc. predictions. But it doesn't take in a model object and therefore doesn't actually have enough information to do its job.

This mismatch exists because I didn't understand the uncertainty propagation problem completely enough 2 years ago. @wdwatkins, you and I have explored aspects of this problem since then. It's a different problem for each model type; I've backlogged the GitHub issues for fixing this problem for composite and interpolation and lm() models, but it's immediately fixable for loadReg2 models, and this issue is about restructuring a bit so that our fix for loadReg2 also paves the way for eventual fixes for the other model types.

I've proposed two possible solutions above but am leaning toward the first, which is consistent with the title of this issue: let's modify predictSolute to accept an argument that specifies the temporal resolution of interest and to return predictions at that resolution.

that new argument can take exactly the same form and name as agg.by in the current aggregateSolute function.
for loadReg2 models, predictSolute should translate the contents of the agg.by argument into something that can be passed to rloadest::predLoad to request load and uncertainty estimates at the desired temporal scale. rloadest::predLoad does uncertainty propagation well for all scales, as long as you ask it specifically for the scale you want.
for loadLm, loadInterp, and loadComp models, predictSolute should always return load or concentration estimates, but should only return numeric uncertainty estimates if the resolution is instantaneous (otherwise return NAs instead). Use the same warning language currently in place within aggregateSolute to explain why we're returning NAs instead of uncertainties when they're requested for aggregate estimates from these models. I hope we can add uncertainty estimates to some of these models later, but those are hard challenges for another day. This structural change will take a little tiny bite out of those challenges, and that's all I want for now.
we're essentially deprecating the aggregateSolute function here. It's possible that we can keep some of the code in place and use aggregateSolute as a helper to the revised predictSolute function, e.g., for aggregating loads (but not uncertainties) for loadLm, loadInterp, and loadComp models. But I think we should be thinking of predictSolute as the new go-to function for prediction at all temporal resolutions, such that aggregateSolute should no longer be needed by external users.

wdwatkins commented 7 years ago

[x] send to rloadest for loadReg2 models
[x] return NAs for other models
[ ] add some examples?

wdwatkins commented 7 years ago

So it seems the mean water year and mean calendar year options will need to be eliminated, since those can't go into rloadest::predLoad? Or can we still incorporate those two options afterwords?

aappling-usgs commented 7 years ago

Hmm, yep, those are harder. For loadflexBatch we restricted the data to complete years and then used predLoad('total') - do you agree that approach is about as rigorous as we could hope for? The loadflexBatch code is at https://github.com/USGS-R/loadflexBatch/blob/master/batchHelperFunctions.R#L268. If it sounds like a good long-term solution to you, then the next question is how hard it would be to add that logic to predictSolute.loadReg2 - what do you think?

wdwatkins commented 7 years ago

Mm yeah I forgot that accomplishes the same thing. That should be doable, we might be able to just pull that code into a loadflex function so it stays in one place.

DOI-USGS / loadflex

expand predictSolute to predict by time period #199