Accounting for age reading errors

bthe commented 3 years ago

Looking at what would be necessary to incorporate age reading errors in g3 I "stumbled upon" this paper on the subject. The idea is that given an age a an age reader i will perceive the age to be a'. So what is actually observed has a distribution p_i(a'|a) which depends on the reader. To estimate this distribution function age readers are often asked to compare their readings using a control set of otoliths/bones/scales etc..

The paper suggests that p_i(a'|a) could be a normal density with age varying mean b_ai and error s_ai of the form:

f_i(a) = c_L + (c_Hi - c_Li) * (1-exp(-alpha_i(a - L))/(1-exp(-alpha_i*(H-L))

where c_Li and c_Hi are the min (L) and max age (H) values for b_a and s_a depending on the context.

I guess that there are use cases where we would like to change the age reading function and/or the f_i function.

To estimate the density function you use the control set (or sets if more than one) and minimize the following:

nll = -log(sum_a(beta_a * prod_i(p_i(a'_i|a))))

where a is the model age and a'_i the observed age from reader 'i'.

So to implement this in g3 we would need to do something like:

Define a catchdistribution likelihood where model ages are transformed to age readings using read specific age reading functions (the p_i(a'|a) above). This would require an additional reader dimension in the likelihood data.
Define an additional likelihood (the nll above) where age readings from a control set that two or more age readers have read and add to the model likelihood.

The second step could be optional, either the user could estimate these parameters outside of the model and/or the data is simply not available so you may want to test different structures. Potentially tagging data could inform on this.

I guess the same sort of logic would apply to maturity data.

lentinj commented 3 years ago

How many readers are you likely to define? I'm guessing we're talking a handful rather than 10's or 100's, seems to me that it'd be easier to define a separate catchdistribution component for each reader rather than add an extra "reader" dimension for a single component. If I'm not being stupid, then the main thing is applying a transformation to the model data to convert the age dimension to age-as-read-by-i.

bthe commented 3 years ago

You are right on all counts, the typical number of age readers for a particular species is usually counted on the fingers of one hand and effectively you are creating separate likelihoods for each reader where you would define reader specific age or maturity transformation.

Additionally we will need to define a parameter likelihood function where the control set reads are incorporated. Note that this is also what is needed to define random effects.

lentinj commented 3 years ago

Sketching out the changes required for the first half a bit more. This could work with a g3l_likelihood_agereader() to modify output of g3l_likelihood_distribution(), insert an additional step between collation of modelstock vars and comparison to obsstock, to convert modelstock__num (or __wgt) to as-read age. The actual transformation will end up being implemented in a pretty similar manner to migration---it's the same really, just a different dimension.

The additional likelihood function I'm just about understanding, but think we'll need to go over it monday before it truly sinks in.

lentinj commented 3 years ago

@bthe I've had a go implementing the housekeeping for this to work, have a look at the example script here: https://github.com/gadget-framework/gadget3/blob/age-reading-error-40/demo-age-reading-error/run.R I'm pretty sure it works :)

I've not implemented the distribution matrix calculation, the whole matrix is currently handed to it as an array parameter reader1matrix
g3l_catchdistribution has an additional transform_fs to transform the age dimension. Identical to migration, you give a formula to say how much goes from age --> destage. In this case, we just look up in a matrix.
g3l_controlset compares against controlsets.
- The data.frame should probably have year/step columns to do multiple comparisons.
- I'm not sure if the beta parameters (~sample size according to the paper) should end up in the table or be formulas/parameters?
- Any better naming suggestions for g3l_controlset?

lentinj commented 1 year ago

As suspected, g3l_controlset is nonsense. The transform_fs bit is good, but there needs to be an action that runs once at the beginning of the model that:

Takes a formula for mean, stddev for each reader, forms transform matrix to use in transform_fs. Default ones from paper, parametrised as they suggest.
(optionally) takes a table of otholiths & recorded ages, if this is available we add to nll the results of applying the mean & stddev on table through (5) in paper.

This is useful without, since it may be the wiggle-room needed to e.g. account for variances between growth as observed by tagging data and age measurement.

lentinj commented 1 year ago

The above merges transform_fs.

For modelling the error matrix, consider implementing a TMB version of outer to form the matrix.

lentinj commented 1 year ago

out

gadget-framework / gadget3

Accounting for age reading errors #40