gadget-framework / gadget3

TMB-based gadget implemtation
GNU General Public License v2.0
8 stars 6 forks source link

"one step ahead" predictions to assess the fit to data #63

Open lentinj opened 2 years ago

lentinj commented 2 years ago

Support the use of TMB::oneStepPredict() in gadget3-generated models. This will require the addition of an “indicator variable”, with the same dimensions as the observations array, used to turn individual observations on/off when calculating likelihood. Add examples to the demo-ling model to show how it could be used.

lentinj commented 2 years ago

It's not the same dimensions as the observations array, it's the same dimensions of nllstock, i.e. if broken down by model timestep a on/off switch at each timestep.

This means all previous concerns about on/offing nonsense dimensions go out the window. nllstock is at most a vector over time.

This is nonsense. Then it doesn't match the observations array, which is the whole point.

lentinj commented 2 years ago

The main decision left here is how it's off-on'ed. We can just add it everywhere but doing so creates a useless array of 1's.

However, adding explicit "indicator_var" booleans to g3l_distribution_sumofsquares and friends would be a bit annoying to (forget to) off/on when required.

lentinj commented 2 years ago

An attempt at doing this is in the commit above. Parking this for now until we've worked out how to turn on/off for an entire timestep in one go

bthe commented 4 days ago

Reviving this thread, this package: https://github.com/fishfollower/compResidual/ seems to give examples on how to deal with composition data, and as an added bonus https://github.com/vtrijoulet/OSA_multivariate_dists/ has a couple of likelihood distributions worth implementing.

lentinj commented 4 days ago

The most interesting part for us is here: https://github.com/fishfollower/compResidual/blob/d4c74845089074d8016454f235044c5d13ded3a5/compResidual/src/compResidual.cpp#L21-L33

obs & pred are the input matrices flattened and .segment() chops up obs, pred & keep into single timeseries vectors, each of which is thrown at dmultinom_osa().

Why they're doing this rather than arrays I'm not sure, but dmultinom_osa() is basically the puzzle piece I was missing earlier to make this work.

How we'd slice our obs, pred and keep is another thing to figure out, here the data gets transposed on the way in, we'd have to do some Eigen shenanigans to extract a timeseries vector from our array.