[Feature]: add inverse logit transformation function to fims_math

ChristineStawitz-NOAA commented 2 years ago

Is your feature request related to a problem? Please describe.

We need a way to transform parameters to run the model comparison

Describe the solution you would like.

an inverse logit function in fims_math which can be called from a transformation function for parameters

Describe alternatives you have considered

using nlminb bounding

Statistical validity, if applicable

This is more statistically robust than using bounded algorithms

Describe if this is needed for a management application

No response

Additional context

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

ChristineStawitz-NOAA commented 2 years ago

Do we need to explicitly bound the gradient when passing to TMB or does this happen internally through the TMB software?

gradients are based on the parameters in real space
likelihoods are based on the parameters in transformed space

log/exp transformations should be fine but we might need to investigate more how to do bounds between two discrete values and if something needs to be changed w/in the minimizer. Maybe this is a good thing to talk about with Kasper if we do a sync with the modeling team.

@timjmiller how does WHAM implement value bounded parameters?

timjmiller commented 2 years ago

In wham, all parameters for optimization have no bounds. Transformations (primarily exp or inverse logit) are used to make bounded parameters needed for models (e.g., standard deviations, probabilities). The inverse logit can accommodate any specified lower and upper bounds, but there may be better transformation functions than the inverse logit?

There is also a "squeeze" function in TMB which constrain values to be (0,1) rather than [0,1] to avoid NAs when passed to other functions. helps in certain situations such as maybe a probability passed to dmultinom(), etc.

On Tue, Aug 9, 2022 at 1:43 PM Christine Stawitz - NOAA < @.***> wrote:

Do we need to explicitly bound the gradient when passing to TMB or does this happen internally through the TMB software?

gradients are based on the parameters in real space

likelihoods are based on the parameters in transformed space

log/exp transformations should be fine but we might need to investigate more how to do bounds between two discrete values and if something needs to be changed w/in the minimizer. Maybe this is a good thing to talk about with Kasper if we do a sync with the modeling team.

@timjmiller https://github.com/timjmiller how does WHAM implement bounded parameters?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1209684473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIGN7GCCWI5OT6ZT5DQGXDVYKKCBANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Timothy J. Miller, PhD (he, him, his) Research Fishery Biologist NOAA, Northeast Fisheries Science Center Woods Hole, MA 508-495-2365

ChristineStawitz-NOAA commented 2 years ago

Thanks Tim!

Wondering if @Cole-Monnahan-NOAA has any thoughts on types of transforms given his research into ADMB bounds?

nathanvaughan-NOAA commented 2 years ago

We were just talking about this in TMB training with @Andrea-Havron-NOAA

a<-5 b<-15 x <- c(-Inf,-1000000,-10,-5,-1,0,1,5,10,10000000,Inf) y <- ((b-a) * (tanh(x/2)+1)/2) + a y [1] 5.000000 5.000000 5.000454 5.066929 7.689414 10.000000 12.310586 14.933071 14.999546 15.000000 15.000000

Is this similar to what you are doing Tim?

Cole-Monnahan-NOAA commented 2 years ago

@ChristineStawitz-NOAA I don't follow your question about bounding the gradient. Could you clarify?

FWIW, ADMB uses a sine transformation by default (see section 2.1 this old doc. This allows parameters to go right to the bound which as implications for "convergence" and positive definite Hessians. Namely, it is much more forgiving for parameters stuck on bounds. Instead of a non-positive definite Hessian error and failing to calculate uncertainties, it'll invert it and have meaningless SE associated with those parameters.

Stan uses the logit approach that WHAM does. I think there are some advantages of having both transformations.

I would also like to see the capability of having one-sided bounds like Stan does (details). It would also be helpful to have Jacobian adjustments added internally automatically as both Stan and ADMB do. They're toggled on/off depending on the mode of inference.

I remember talking to @msupernaw about this a while ago and he also explored this in some depth.

msupernaw commented 2 years ago

@Cole Monnahan - NOAA Federal @.***> Christine is referring to the parameter updates, which use the scaled gradient value to update the scaled value before it is transformed to the external value via the transformation function.

On Tue, Aug 9, 2022 at 4:01 PM Cole Monnahan @.***> wrote:

@ChristineStawitz-NOAA https://github.com/ChristineStawitz-NOAA I don't follow your question about bounding the gradient. Could you clarify?

FWIW, ADMB uses a sine transformation by default (see section 2.1 this old doc https://www.researchgate.net/publication/282862345_Details_of_AD_Model_Builder's_covariance_calculations. This allows parameters to go right to the bound which as implications for "convergence" and positive definite Hessians. Namely, it is much more forgiving for parameters stuck on bounds. Instead of a non-positive definite Hessian error and failing to calculate uncertainties, it'll invert it and have meaningless SE associated with those parameters.

Stan uses the logit approach that WHAM does. I think there are some advantages of having both transformations.

I would also like to see the capability of having one-sided bounds like Stan does (details https://mc-stan.org/docs/reference-manual/lower-bound-transform.html). It would also be helpful to have Jacobian adjustments added internally automatically as both Stan and ADMB do. They're toggled on/off depending on the mode of inference.

I remember talking to @msupernaw https://github.com/msupernaw about this a while ago and he also explored this in some depth.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1209817438, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUSEAHFCM5366XFVJARS3VYK2JVANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Matthew Supernaw Scientific Software Developer National Oceanic and Atmospheric Administration Office Of Science and Technology NOAA Fisheries | U.S. Department of Commerce Phone 248 - 396 - 7797

Cole-Monnahan-NOAA commented 2 years ago

The gradient calculations and updates are done in the unbounded parameter space, no? I must be missing something.

msupernaw commented 2 years ago

@Cole Monnahan - NOAA Federal @.***> The updates should be done in bounded space before they are transformed to unbounded parameter space. I believe it's a stability issue.

On Tue, Aug 9, 2022 at 4:26 PM Cole Monnahan @.***> wrote:

The gradient calculations and updates are done in the unbounded parameter space, no? I must be missing something.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1209839264, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUSEGDPJG7PI6EFJSWJG3VYK5GTANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Matthew Supernaw Scientific Software Developer National Oceanic and Atmospheric Administration Office Of Science and Technology NOAA Fisheries | U.S. Department of Commerce Phone 248 - 396 - 7797

Cole-Monnahan-NOAA commented 2 years ago

That seems the opposite to me, and is counter to how Stan and ADMB function which is to do all those calculations in unbounded space. If you update in bounded space then you can easily step outside the bounded range. This will be true for the optimizer and numerical integrators that use MCMC.

For derived quantities we often back-transform a bounded quantity to better meet the asymptotic normality assumptions (e.g., selectivity). But that is a different thing. We're talking about parameters only right?

Andrea-Havron-NOAA commented 2 years ago

I agree with Cole - my intuition is that parameter updates need to be made in real (unbounded) space, after which a transofrmation is applied to convert values to parameter (bounded) space. Otherwise, parameter updates could step outside the bounds of a parameter. Additionally, my intuition is that when transforming a parameter within the model (eg. sigma = exp(lnSigma)), the likelihoods and gradients are calculated using the transformed (bounded) parameter. Is this different from constrained optimization? such as, simple normal model - y ~ N(mu, sd)

obj <- MakeADFun(data = rnorm(100, 2, 1), parameters = c(2,1),...)
nlminb(start = c(2,1), obj$fn, obj$gr, lower = c(-Inf, 0), upper = c(Inf,Inf))

With respect to bounds, MLE asymptotic theory breaks down when a parameter is on the bounds of its parameter space (eg. sd = 0) and should be avoided at all costs. Natural transformations (eg. exp(lnSigma), invlogit(logitPi)) tend to keep the minimier away from parameter bounds.

Andrea-Havron-NOAA commented 2 years ago

I just pushed a super simple example comparing parameter transformation vs. constrained optimization with nlminb and TMB up to the statistical investigations repo. The two approaches produced similar results when sigma = 1. When the true sigma parameter was small, the transformation method had better success at estimating SEs over the constrained optimization method.

Caveats:

This only considers parameters on the bounds of parameter space. Bounding parameters to ensure model outputs are ecologically sensible is a different topic (eg. 0.2 < steepness < 1)
This is a super simple example and results may be different for more complicated models
This example only demonstrates the performance of constrained optimization with nlminb, other minimizers may perform better
As TMB does not have a 'baked in' minimizer, any optimization constraints we implement for FIMS M1 would happen outside of TMB and be specific to the minimizer we are implementing in R. Not sure we would need FIMS C++ specific code for this task

msupernaw commented 2 years ago

@Cole Monnahan - NOAA Federal @.***> Yeah, we're talking about parameters with upper and lower limits only. If a parameter is constrained, the minimizer will transform the value, as well as its derivative. The result is a "working" parameter and gradient vector. Some values may be transformed and some are not. Updates are applied in transformed space for those that are and then the new value is transformed back to the external representation prior to the next iteration.

On Tue, Aug 9, 2022 at 7:39 PM Andrea-Havron-NOAA @.***> wrote:

I just pushed a super simple example https://github.com/NOAA-FIMS/FIMS_statistical_computing_investigations/blob/main/R/transform_sigma.R comparing parameter transformation vs. constrained optimization with nlminb and TMB up to the statistical investigations repo https://github.com/NOAA-FIMS/FIMS_statistical_computing_investigations. The two approaches produced similar results when sigma = 1. When the true sigma parameter was small, the transformation method had better success at estimating SEs over the constrained optimization method.

Caveats:

This only considers parameters on the bounds of parameter space. Bounding parameters to ensure model outputs are ecologically sensible is a different topic (eg. 0.2 < steepness < 1)

This is a super simple example and results may be different for more complicated models

This example only demonstrates the performance of constrained optimization with nlminb, other minimizers may perform better

As TMB does not have a 'baked in' minimizer, any optimization constraints we implement for FIMS M1 would happen outside of TMB and be specific to the minimizer we are implementing in R. Not sure we would need FIMS C++ specific code for this task

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1209996212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUSEA4XXDMKZD7JWMJR6LVYLTZDANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Matthew Supernaw Scientific Software Developer National Oceanic and Atmospheric Administration Office Of Science and Technology NOAA Fisheries | U.S. Department of Commerce Phone 248 - 396 - 7797

Cole-Monnahan-NOAA commented 2 years ago

Hi Andrea, could you share the plots so we don't have to run it? I also think for it to be comparable you would want to ADREPORT the log of sigma when transform==1. The nlminb code presumably is applying the logistic transform and adjusting as needed, so we should get the same or very similar results. I would be surprised if there were a meaningful difference, but always good to check.

I think there are important pros/cons to each way, so maybe we should try to come up with a table. Let's call them "internal" and "external". Internal is where the parameter is declared in unbounded space (-Inf,Inf) and manually transformed in the model code (transform==1 in this example). External is when the parameter is declared in bounded space and there are no constraints imposed inside the model at all -- it relies completely on external forces to keep the parameter within its bounds.

Internal

pros

Parameter SE and thus confidence intervals are automatically calculated in unbounded space where they make more sense. Then transformed CIs comply with the bounds. For MLE uncertainty calculations this is preferred statistically, I think.
We avoid complicated upper/lower bound lists on the R side of things. These are needed for both nlminb and tmbstan functions.
There is presumably a little overhead saved by doing them in C++, especially if we hardcode the analytical derivatives.

cons

The parameters names are not as recognizable, and an extra step needs to be done on R side to get a CI for transformed parameters. So e.g., we'll have parameter logit_h and calculate the CI of that and then transform to CI of h. But the sdreport will show logit_h. So I think it's worse from the user perspective, if they're looking at that stuff?
Jacobian adjustments are needed when integrating (not the LA).

External

pros

Parameter estimates and SEs from sdreport are ready to go. Same with posterior samples.
Parameter names are identifiable easily.
No Jacobian nonsense needed.

cons

Need to build and pass lists of bounds in R.
Others?

Did I miss anything important? I think I prefer the internal option b/c so much will be hidden from the user and it has some nice statistical properties. It means a little more work on our end to get CI for bounded parameters. ADMB does this by calculating the SE in unbounded space and then using the delta method to get variances and covariances in bounded space (to get the .cor and .std files). But these often lead to CI that violate the bounds.

@msupernaw I understand the conversion between bounded and unbounded. Perhaps we're using the words in the opposite way. To me "transformed" = bounded and "untransformed" = unbounded. All proposed parameter vectors, in both optimization and MCMC, are in unbounded space.

Andrea-Havron-NOAA commented 2 years ago

Thanks @Cole-Monnahan-NOAA for this helpful summary! You and @msupernaw have put much more thought into this issue previously than I have, so I appreciate the detailed exaplanations. I'll post plots and make the recommended changes to the code.

I would add to the cons of the external approach: Increased chance of non-invertible Hessian when parameter near the bound resulting in NA SE.

timjmiller commented 2 years ago

On Cole's last point about CIs I think that is a con of the "external" approach. For asymptotic/delta-method based inferences, I typically construct CIs on the unbounded parameters and transform those for CIs of the bounded parameter so that they are consistent with the range of the bounded parameter.

If using the "internal" approach, it might be good to make a check for "very large/small" parameter values after minimization but before sdreport() to flag parameters that are problematic. Even if the hessian is invertible, there is often a simpler model that performs equally well and you save time by not calculating the sdreport unnecessarily. E.g., if the SD parameter for a set of random effects is going to zero (log(SD) is going to a very negative number), the model without the random effects makes the same predicted values. The delta AIC for these models is exactly 2. When fitting several models, maybe checking for approximately equal NLL values would be another type of check that would also include catching complete confounding of parameters.

On Wed, Aug 10, 2022 at 12:56 PM Cole Monnahan @.***> wrote:

Hi Andrea, could you share the plots so we don't have to run it? I also think for it to be comparable you would want to ADREPORT the log of sigma when transform==1. The nlminb code presumably is applying the logistic transform and adjusting as needed, so we should get the same or very similar results. I would be surprised if there were a meaningful difference, but always good to check.

I think there are important pros/cons to each way, so maybe we should try to come up with a table. Let's call them "internal" and "external". Internal is where the parameter is declared in unbounded space (-Inf,Inf) and manually transformed in the model code (transform==1 in this example). External is when the parameter is declared in bounded space and there are no constraints imposed inside the model at all -- it relies completely on external forces to keep the parameter within its bounds. Internal

pros

Parameter SE and thus confidence intervals are automatically calculated in unbounded space where they make more sense. Then transformed CIs comply with the bounds. For MLE uncertainty calculations this is preferred statistically, I think.

We avoid complicated upper/lower bound lists on the R side of things. These are needed for both nlminb and tmbstan functions.

There is presumably a little overhead saved by doing them in C++, especially if we hardcode the analytical derivatives.

cons

The parameters names are not as recognizable, and an extra step needs to be done on R side to get a CI for transformed parameters. So e.g., we'll have parameter logit_h and calculate the CI of that and then transform to CI of h. But the sdreport will show logit_h. So I think it's worse from the user perspective, if they're looking at that stuff?

Jacobian adjustments are needed when integrating (not the LA).

External

pros

Parameter estimates and SEs from sdreport are ready to go. Same with posterior samples.

Parameter names are identifiable easily.

No Jacobian nonsense needed.

cons

Need to build and pass lists of bounds in R.

Others?

Did I miss anything important? I think I prefer the internal option b/c so much will be hidden from the user and it has some nice statistical properties. It means a little more work on our end to get CI for bounded parameters. ADMB does this by calculating the SE in unbounded space and then using the delta method to get variances and covariances in bounded space (to get the .cor and .std files). But these often lead to CI that violate the bounds.

@msupernaw https://github.com/msupernaw I understand the conversion between bounded and unbounded. Perhaps we're using the words in the opposite way. To me "transformed" = bounded and "untransformed" = unbounded. All proposed parameter vectors, in both optimization and MCMC, are in unbounded space.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1210991968, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIGN7DYRJIJDSI4EZ2BHT3VYPNMLANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Timothy J. Miller, PhD (he, him, his) Research Fishery Biologist NOAA, Northeast Fisheries Science Center Woods Hole, MA 508-495-2365

msupernaw commented 2 years ago

I added a boring example to the repo that shows the behavior of the different transformation functions.

https://github.com/NOAA-FIMS/FIMS_statistical_computing_investigations/blob/main/src/transform_functions.cpp

On Wed, Aug 10, 2022 at 1:21 PM Andrea-Havron-NOAA @.***> wrote:

Thanks @Cole-Monnahan-NOAA https://github.com/Cole-Monnahan-NOAA for this helpful summary! You and @msupernaw https://github.com/msupernaw have put much more thought into this issue previously than I have, so I appreciate the detailed exaplanations. I'll post plots and make the recommended changes to the code.

I would add to the cons of the external approach: Increased chance of non-invertible Hessian when parameter near the bound resulting in NA SE.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1211019580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUSEFRYAKEVPSNJNU6WXTVYPQIDANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Matthew Supernaw Scientific Software Developer National Oceanic and Atmospheric Administration Office Of Science and Technology NOAA Fisheries | U.S. Department of Commerce Phone 248 - 396 - 7797

ChristineStawitz-NOAA commented 1 year ago

It seems like we are going with the internal approach? It's easy to put in the bounding functions but I'm not sure if there is an easy way to test the CI transformation until we get the model estimating

Cole-Monnahan-NOAA commented 1 year ago

I agree w/ @timjmiller that the CI is a con of the external approach, although this is how ADMB does it whenever a bounded parameter is used so clearly we've been OK with this as a field for a long time. How does Stan do this for bounded parameters? I haven't used their optimizer before. I know you have to do any derived quantity delta method stuff on your own.

I don't really understand the point of @msupernaw's code so I have no comment on that.

I vote the internal approach.

ChristineStawitz-NOAA commented 1 year ago

We are agreed on the internal approach but don't yet use a logit transform - this needs to be added for steepness

timjmiller commented 1 year ago

and certain selectivity parameters?

ChristineStawitz-NOAA commented 1 year ago

do you mean restricting the median age/length to be bounded between min and max age/length bin?

timjmiller commented 1 year ago

maybe, but even simpler, when estimating age-specific parameters between 0 and 1.

ChristineStawitz-NOAA commented 1 year ago

ah OK. We went through this this morning and were trying to determine if we needed to fix this for M1. We came up with steepness as the only parameter that might need a logistic transform for M1, but we will definitely need it for a lot of parameters in M2+ (like age specific selectivity).

Cole-Monnahan-NOAA commented 1 year ago

Presumably we're using log transform for most things then? WHAM uses logit for things that have no natural upper bound, just setting it to an arbitrarily large value which gets you about the same thing. The ADMB manual recommends doing this to "stabilize optimization" I think. Basically constrain things to reasonable values so it doesn't go off the rails. That's one potential advantage of using the logit over log. I'm not sure it makes sense to do but is something to think about.

timjmiller commented 1 year ago

I think catchability and selectivity parameters are the only things that use that logit transformation in WHAM. For catchability, it was intended to do as Cole guessed. FWIW, the upper and lower bounds can be inputs (as in WHAM) so that they can be adjusted as needed. Regarding steepness, that transform is only appropriate for BH (not Ricker).

On Tue, Dec 6, 2022 at 6:42 PM Cole Monnahan @.***> wrote:

Presumably we're using log transform for most things then? WHAM uses logit for things that have no natural upper bound, just setting it to an arbitrarily large value which gets you about the same thing. The ADMB manual recommends doing this to "stabilize optimization" I think. Basically constrain things to reasonable values so it doesn't go off the rails. That's one potential advantage of using the logit over log. I'm not sure it makes sense to do but is something to think about.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/211#issuecomment-1340163518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIGN7FGHZ7CJAGNDL7BJKLWL7FN5ANCNFSM556CQ4LQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Timothy J. Miller, PhD (he, him, his) Research Fishery Biologist NOAA, Northeast Fisheries Science Center Woods Hole, MA 508-495-2365

Cole-Monnahan-NOAA commented 1 year ago

FWIW everything in SS3 is bounded. So that's not unfamiliar to many users.

NOAA-FIMS / FIMS