Open lentinj opened 5 months ago
Looking at stomachcontent.cc, gadget2 treats everything in each aggregation grouping as one mega-stomach, and then normalises this over all prey in that grouping.
In gadget3 I think the easiest way to implement this is to gather absolute values of biomass / indivduals per-stock, as we do for any other catch distribution, then have a g3l_distribution_stockratio_dirichlet
() that first converts to a ratio of the sum of all stocks, and calculates nll. This isn't very different to g3l_distribution_sumofsquares
(), just over prey instead of area.
For symmetries' sake, I think we should normalise the observation data as well. Then you can provide absolute values for stomach content numbers / weights, which I'm assuming will make life easier @vbartolino ?
As well as saving a setup step, this will mean that the reported observation / model data will have the same units, as the reported model stomach content won't have been normalised before reporting.
EDIT: In fact, I'm pretty sure that scsimple
is equivalent to: g3l_distribution_sumofsquares(over = c('area', 'predator_age', 'predator_length',...other stuff we group over...))
. Having to specify all groupings (vs just saying "stock") is a bit annoying though.
For the Dirichlet distribution, applying it to modelled ratios with externally calculated $\alpha$ vector seems a pretty easy thing to do. But presumably this isn't what's required.
Do we want to first estimate $\alpha$ based on the stomach content observation data (if so how)? Or are we applying the distribution to the difference between observed & modelled ratios? Or something else I've not thought of, because I'm asking stupid questions?
For symmetries' sake, I think we should normalise the observation data as well. Then you can provide absolute values for stomach content numbers / weights, which I'm assuming will make life easier @vbartolino ?
Yes, it makes life easier. I had the impression that it's also what gadget2 was doing isn't it?
Here example of an old g2 fit with stomach data. $ratio
was the input data while my understand was that $observed
is after they are internally normalised
> fit$stomachcontent %>% filter(component=="stom2.rin") %>% head(5) %>% data.frame()
component prey predator year step area ratio number prey.lower
1 stom2.rin len10.5 len40 2008 2 all 3 0.0003081895 10.5
2 stom2.rin len11.5 len40 2007 4 all 19 0.0113236680 11.5
3 stom2.rin len11.5 len40 2017 4 all 10 0.0083094559 11.5
4 stom2.rin len11.5 len40 2013 4 all 17 0.0284688040 11.5
5 stom2.rin len11.5 len40 2020 2 all 30 0.0131694190 11.5
prey.upper lower upper observed predicted prey.length pred.length
1 11.5 40 160 0.00143747 0.0003081895 11 100
2 12.5 40 160 0.08755760 0.0113236680 12 100
3 12.5 40 160 0.10309278 0.0083094559 12 100
4 12.5 40 160 0.05647841 0.0284688039 12 100
5 12.5 40 160 0.08130081 0.0131694190 12 100
For the Dirichlet distribution, applying it to modelled ratios with externally calculated α vector seems a pretty easy thing to do. But presumably this isn't what's required.
Do we want to first estimate α based on the stomach content observation data (if so how)? Or are we applying the distribution to the difference between observed & modelled ratios? Or something else I've not thought of, because I'm asking stupid questions?
probably I know too little to talk about this... I wonder if it would be possible to have both the options of trying to estimate $\alpha$ in the model (something probably pretty difficult; at the last North Sea keyrun the SMS model was easily hitting the boundaries) or provide it as input. If provided as input, I guess that a bootstrap or some resampling of the stomach data could allow a first estimation of the variance to serve as input.
not sure if useful, but this is a document about the use of the Dirichlet distribution to fit the stomach data in SMS dirchlet_alpa0.pdf
Yes, [normalising observations] makes life easier. I had the impression that it's also what gadget2 was doing isn't it?
Correct, it does. I'd missed where that happens in the code. Sorry!
not sure if useful, but this is a document about the use of the Dirichlet distribution to fit the stomach data in SMS
That's exactly the paper I needed, thanks!
The oversimplified summary is "Externally estimated alpha, using the FishStomachs package to bootstrap the stomach content data, fitted to a Dirichlet distribution using Compositional::diri.est, possibly with a SMS-estimated scaling factor".
From a gadget3 point of view, supporting the same process would be fairly straightforward. It'd be a bit weird because we use the observation data to specify likelihood aggregations and we don't have any here, but that's not the end of the world. We could use diri.est
() within gadget3, but the suggestion seems to be that the alpha estimation is not something you'd expect to just work.
https://gadget-framework.github.io/gadget2/userguide/chap-like.html#sec-stomach
Option to convert catch into ratios by prey stock, add sc_simple (specialisation of sumofsquares, depending how ratio is implemented?), Dirichlet distribution comparison functions