Open ChristineStawitz-NOAA opened 1 year ago
A few reasons that occur to me on why it might be better to rescale internally are
I concur with the consolidation within Fleet to just catch_numbers_at_age and catch_weight_at_age, although catch_weight_at_age is numbers * bodywt and need not be stored as an entity. Once you get to length-based models, fleet-specific bodywt will not be an input data type; instead it will be derived from growth and length-selectivity. I also do not see the value in summing catch_at_age across fleets. There are no logL components that depend on it. It is for reporting only. in SS3, sum to zero happens to the data as first step after it is read; per Ian's comment. For the expected composition, the sum to zero is applied one step before passing the vector to the logL fxn. I think the composition vectors must be double, not integer. There are many reasons why the raw composition obs will get weighted in a way that produces fractions of fish.
I don't think we'll notice a performance hit for doing the sum-to-one in the NLL.
We can always profile it in MQ, but typically it's the life history methods that consume the most CPU cycles.
From chatting with @Andrea-Havron-NOAA - In TMB Dmultinom the expected values (p) must sum to one and the observations must be integers (x).
obs must be integers seems very limiting. What is rationale? As I noted earlier, the output of raw data processing programs will weight the raw (integer) observations by various scaling factors as the overall composition of a fleet's catch in a season is assembled. Final output of that process can have fractional fish.
Sorry for lack of explanation, the above was a note I took during our team meeting. The multinomial distribution in TMB is a copy of the R multinomial distribution, neither of which allow non-integer values because the pdf will only integrate to 1 with the normalizing constant when there are integer values.
This doesn't mean we can't find a way to deal with non-integer data; there are transformation functions in the literature that @Andrea-Havron-NOAA is looking into and will update here, and we can also look into what other TMB-based assessment models do.
@ChristineStawitz-NOAA I don't think there is any performance hit for summing in the NLL. This calculation will be on the tape and require the same number of operations regardless of where it is calculated.
note that TMB requires that data are not "transformed" in any way on the C++ side for the osa residual machinery to work correctly.
Is your feature request related to a problem? Please describe.
The Fleet module has a number of derived quantity vectors:
The Population module also has the following vectors:
Where the Population vectors are just their respective Fleet counterparts, summed over all fleets.
We implemented the Fleet NLL (with @Andrea-Havron-NOAA and @msupernaw ) such that catch_numbers_at_age is assumed to be of double or integer types, we sum-to-one within the NLL function, and we never use the age_composition vector.
I have a couple of questions..
@msupernaw do you think there's any performance hit from doing the sum-to-one as part of the NLL?
Does the stock synthesis team (@iantaylor-NOAA @k-doering-NOAA @kellijohnson-NOAA @Rick-Methot-NOAA ) have any opinions on below options based on experience with supporting internal sum-to-one?
Describe the solution you would like.
Describe alternatives you have considered
Statistical validity, if applicable
Multinomial distribution doesn't work if the composition data don't sum to one
Describe if this is needed for a management application
No response
Additional context
No response