NOAA-FIMS / FIMS

The repository for development of FIMS
https://noaa-fims.github.io/FIMS/
GNU General Public License v3.0
16 stars 9 forks source link

[Feature]: Sum to 1 constraint on age comps internally or externally? #326

Open ChristineStawitz-NOAA opened 1 year ago

ChristineStawitz-NOAA commented 1 year ago

Is your feature request related to a problem? Please describe.

The Fleet module has a number of derived quantity vectors:

The Population module also has the following vectors:

Where the Population vectors are just their respective Fleet counterparts, summed over all fleets.

We implemented the Fleet NLL (with @Andrea-Havron-NOAA and @msupernaw ) such that catch_numbers_at_age is assumed to be of double or integer types, we sum-to-one within the NLL function, and we never use the age_composition vector.

I have a couple of questions..

  1. @msupernaw do you think there's any performance hit from doing the sum-to-one as part of the NLL?

  2. Does the stock synthesis team (@iantaylor-NOAA @k-doering-NOAA @kellijohnson-NOAA @Rick-Methot-NOAA ) have any opinions on below options based on experience with supporting internal sum-to-one?

Describe the solution you would like.

  1. Delete age_composition since it is never used
  2. rename expected_catch to catch_weight_at_age to improve clarity & consistency
  3. Delete catch_at_age since it is never used

Describe alternatives you have considered

  1. We could do the sum-to-one within the Fleet or Population modules, store it in age_composition, and then use age_composition in the negative log likelihood
  2. We could delete age_composition and do the sum-to-one within the R input functions, such that we know catch_numbers_at_age always sums to one on input

Statistical validity, if applicable

Multinomial distribution doesn't work if the composition data don't sum to one

Describe if this is needed for a management application

No response

Additional context

No response

iantaylor-NOAA commented 1 year ago

A few reasons that occur to me on why it might be better to rescale internally are

Rick-Methot-NOAA commented 1 year ago

I concur with the consolidation within Fleet to just catch_numbers_at_age and catch_weight_at_age, although catch_weight_at_age is numbers * bodywt and need not be stored as an entity. Once you get to length-based models, fleet-specific bodywt will not be an input data type; instead it will be derived from growth and length-selectivity. I also do not see the value in summing catch_at_age across fleets. There are no logL components that depend on it. It is for reporting only. in SS3, sum to zero happens to the data as first step after it is read; per Ian's comment. For the expected composition, the sum to zero is applied one step before passing the vector to the logL fxn. I think the composition vectors must be double, not integer. There are many reasons why the raw composition obs will get weighted in a way that produces fractions of fish.

msupernaw commented 1 year ago

I don't think we'll notice a performance hit for doing the sum-to-one in the NLL.

msupernaw commented 1 year ago

We can always profile it in MQ, but typically it's the life history methods that consume the most CPU cycles.

ChristineStawitz-NOAA commented 1 year ago

From chatting with @Andrea-Havron-NOAA - In TMB Dmultinom the expected values (p) must sum to one and the observations must be integers (x).

Rick-Methot-NOAA commented 1 year ago

obs must be integers seems very limiting. What is rationale? As I noted earlier, the output of raw data processing programs will weight the raw (integer) observations by various scaling factors as the overall composition of a fleet's catch in a season is assembled. Final output of that process can have fractional fish.

ChristineStawitz-NOAA commented 1 year ago

Sorry for lack of explanation, the above was a note I took during our team meeting. The multinomial distribution in TMB is a copy of the R multinomial distribution, neither of which allow non-integer values because the pdf will only integrate to 1 with the normalizing constant when there are integer values.

This doesn't mean we can't find a way to deal with non-integer data; there are transformation functions in the literature that @Andrea-Havron-NOAA is looking into and will update here, and we can also look into what other TMB-based assessment models do.

msupernaw commented 1 year ago

@ChristineStawitz-NOAA I don't think there is any performance hit for summing in the NLL. This calculation will be on the tape and require the same number of operations regardless of where it is calculated.

timjmiller commented 1 year ago

note that TMB requires that data are not "transformed" in any way on the C++ side for the osa residual machinery to work correctly.