gadget-framework / gadget3

TMB-based gadget implemtation
GNU General Public License v2.0
8 stars 6 forks source link

Predation likelhood - g3l_distribution_stockratio_dirichlet #148

Open lentinj opened 5 months ago

lentinj commented 5 months ago

https://gadget-framework.github.io/gadget2/userguide/chap-like.html#sec-stomach

Option to convert catch into ratios by prey stock, add sc_simple (specialisation of sumofsquares, depending how ratio is implemented?), Dirichlet distribution comparison functions

lentinj commented 4 months ago

Looking at stomachcontent.cc, gadget2 treats everything in each aggregation grouping as one mega-stomach, and then normalises this over all prey in that grouping.

In gadget3 I think the easiest way to implement this is to gather absolute values of biomass / indivduals per-stock, as we do for any other catch distribution, then have a g3l_distribution_stockratio_dirichlet() that first converts to a ratio of the sum of all stocks, and calculates nll. This isn't very different to g3l_distribution_sumofsquares(), just over prey instead of area.

For symmetries' sake, I think we should normalise the observation data as well. Then you can provide absolute values for stomach content numbers / weights, which I'm assuming will make life easier @vbartolino ?

As well as saving a setup step, this will mean that the reported observation / model data will have the same units, as the reported model stomach content won't have been normalised before reporting.

EDIT: In fact, I'm pretty sure that scsimple is equivalent to: g3l_distribution_sumofsquares(over = c('area', 'predator_age', 'predator_length',...other stuff we group over...)). Having to specify all groupings (vs just saying "stock") is a bit annoying though.

lentinj commented 4 months ago

For the Dirichlet distribution, applying it to modelled ratios with externally calculated $\alpha$ vector seems a pretty easy thing to do. But presumably this isn't what's required.

Do we want to first estimate $\alpha$ based on the stomach content observation data (if so how)? Or are we applying the distribution to the difference between observed & modelled ratios? Or something else I've not thought of, because I'm asking stupid questions?

vbartolino commented 4 months ago

For symmetries' sake, I think we should normalise the observation data as well. Then you can provide absolute values for stomach content numbers / weights, which I'm assuming will make life easier @vbartolino ?

Yes, it makes life easier. I had the impression that it's also what gadget2 was doing isn't it? Here example of an old g2 fit with stomach data. $ratio was the input data while my understand was that $observedis after they are internally normalised

> fit$stomachcontent %>% filter(component=="stom2.rin") %>% head(5) %>% data.frame()
  component    prey predator year step area ratio       number prey.lower
1 stom2.rin len10.5    len40 2008    2  all     3 0.0003081895       10.5
2 stom2.rin len11.5    len40 2007    4  all    19 0.0113236680       11.5
3 stom2.rin len11.5    len40 2017    4  all    10 0.0083094559       11.5
4 stom2.rin len11.5    len40 2013    4  all    17 0.0284688040       11.5
5 stom2.rin len11.5    len40 2020    2  all    30 0.0131694190       11.5
  prey.upper lower upper   observed    predicted prey.length pred.length
1       11.5    40   160 0.00143747 0.0003081895          11         100
2       12.5    40   160 0.08755760 0.0113236680          12         100
3       12.5    40   160 0.10309278 0.0083094559          12         100
4       12.5    40   160 0.05647841 0.0284688039          12         100
5       12.5    40   160 0.08130081 0.0131694190          12         100
vbartolino commented 4 months ago

For the Dirichlet distribution, applying it to modelled ratios with externally calculated α vector seems a pretty easy thing to do. But presumably this isn't what's required.

Do we want to first estimate α based on the stomach content observation data (if so how)? Or are we applying the distribution to the difference between observed & modelled ratios? Or something else I've not thought of, because I'm asking stupid questions?

probably I know too little to talk about this... I wonder if it would be possible to have both the options of trying to estimate $\alpha$ in the model (something probably pretty difficult; at the last North Sea keyrun the SMS model was easily hitting the boundaries) or provide it as input. If provided as input, I guess that a bootstrap or some resampling of the stomach data could allow a first estimation of the variance to serve as input.

vbartolino commented 4 months ago

not sure if useful, but this is a document about the use of the Dirichlet distribution to fit the stomach data in SMS dirchlet_alpa0.pdf

lentinj commented 4 months ago

Yes, [normalising observations] makes life easier. I had the impression that it's also what gadget2 was doing isn't it?

Correct, it does. I'd missed where that happens in the code. Sorry!

lentinj commented 4 months ago

not sure if useful, but this is a document about the use of the Dirichlet distribution to fit the stomach data in SMS

That's exactly the paper I needed, thanks!

The oversimplified summary is "Externally estimated alpha, using the FishStomachs package to bootstrap the stomach content data, fitted to a Dirichlet distribution using Compositional::diri.est, possibly with a SMS-estimated scaling factor".

From a gadget3 point of view, supporting the same process would be fairly straightforward. It'd be a bit weird because we use the observation data to specify likelihood aggregations and we don't have any here, but that's not the end of the world. We could use diri.est() within gadget3, but the suggestion seems to be that the alpha estimation is not something you'd expect to just work.