NOAA-FIMS / FIMS

The repository for development of FIMS
https://noaa-fims.github.io/FIMS/
GNU General Public License v3.0
11 stars 7 forks source link

[Developer Issue]: Why not RTMB? #554

Open Andrea-Havron-NOAA opened 2 months ago

Andrea-Havron-NOAA commented 2 months ago

Issue details

RTMB is a wrapper interface that allows models written in R to be directly passed to TMB. Users no longer need to write models in C++ in order to access the TMB statistical inference engine, but can write them directly in R. Both RTMB models and original TMB models are translated into the same C++ computational graphs for statistical inferece.

The question, why not write FIMS in R and use RTMB? has come up several times. It is important to document this decision point and get feedback from the implementation team. A final version of the decision points should be made available to the community similar to issue #553.

I've come up with the following reasons:

There is a tradeoff between portability and community accessibility, both of which are stated goals of the FIMS project.

nathanvaughan-NOAA commented 2 months ago

I don't have an answer to the question but this made me think is it possible that we could use RTMB to allow user-specified R functions to be incorporated into a FIMS model such as a selective function? That could be a really useful use case for avoiding adding multitudes of function variations to base FIMS while allowing users flexibility?

JaneSullivan-NOAA commented 2 months ago

Pro RTMB:

JaneSullivan-NOAA commented 2 months ago

I don't have an answer to the question but this made me think is it possible that we could use RTMB to allow user-specified R functions to be incorporated into a FIMS model such as a selective function? That could be a really useful use case for avoiding adding multitudes of function variations to base FIMS while allowing users flexibility?

@nathanvaughan-NOAA to me this is the only "easy" thing to actually do in FIMS, so I don't see the benefit of creating a way to do this in R (PS it seems Matthew has already done something similar). The hard part of FIMS is the core architecture, the added dimensions, the Rcpp interface, connecting it to TMB, etc.

JaneSullivan-NOAA commented 2 months ago

It is easier to introduce unintended bugs in R given it is a dynamic language and implements implicit conversion (R changes things on you without you realizing it). @Andrea-Havron-NOAA I think a lot of work has been done to prevent this. Can you provide an example?

nathanvaughan-NOAA commented 2 months ago

Great points @JaneSullivan-NOAA !! and thanks for the link, that is exactly what I was thinking.

msupernaw commented 2 months ago

@Jane Sullivan - NOAA Federal @.> R is a untyped programming language. In a untyped language, a type for a variable can change at any time. For example, a variable defined as a list in R can easily be redefined as any other type, such as a vector, matrix, S4 object, etc and no warning will be given. I believe this is the kind of unintended bug @Andrea Havron - NOAA Federal @.> may have been talking about.

For example:

library(FIMS)a<-new(Parameter)a<-3.1459

This will work in R, but would fail in any typed language because "a" is already defined. In the above example, if "a" was initially an independent variable (Parameter) in a model, it was just redefined as a real value without any warning from R and any use of "a" after the redefinition would result in a derivative chain missing important adjoint derivatives. This would in turn wrongly inform any quasi-newton minimizer and likely result in a wrong solution. This is the kind of unintended bug that can not be prevented in a untyped programming language.

On Mon, Mar 11, 2024 at 8:22 PM Jane Sullivan @.***> wrote:

It is easier to introduce unintended bugs in R given it is a dynamic language and implements implicit conversion (R changes things on you without you realizing it). @Andrea-Havron-NOAA https://github.com/Andrea-Havron-NOAA I think a lot of work has been done to prevent this. Can you provide an example?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/554#issuecomment-1989685628, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUSEFYCQL6PY4J4KEEJYDYXZKEDAVCNFSM6AAAAABERGZOI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBZGY4DKNRSHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Matthew Supernaw Scientific Software Developer National Oceanic and Atmospheric Administration Office Of Science and Technology NOAA Fisheries | U.S. Department of Commerce Phone 248 - 396 - 7797

kellijohnson-NOAA commented 2 months ago

Thanks @Andrea-Havron-NOAA for starting this. I agree that bullet (1) is true but I do not know if I see that as a limitation to not changing. I would like to think that if there was any kind of software that suited FIMS better than what we are currently using that we would put in the effort to make the switch. I am not saying that RTMB is better just that if it was I would think that especially at this stage of development that we would put in the effort to switch.

JaneSullivan-NOAA commented 2 months ago

@jane Sullivan - NOAA Federal @.***> R is a untyped programming language. In a untyped language, a type for a variable can change at any time. For example, a variable defined as a list in R can easily be redefined as any other type, such as a vector, matrix, S4 object, etc and no warning will be given. I believe this is the kind of unintended bug

@msupernaw my point is that this is an R issue and not an RTMB issue, at least not to my knowledge. RTMB does the typing for the user ("typing" - is that how you say it?). Specifically, before we identify this as a "problem" in the "why not RTMB" column, I think we should have at least one example of it actually being a problem in RTMB.

The closest thing I know of is that RTMB does not currently accept sparse matrix classes, so you have to as.vector() them inside a likelihood (example from Sean Anderson). This is a different issue though.

ChristineStawitz-NOAA commented 2 months ago

One other consideration is if using RTMB reduces portability, increasing our dependence on TMB which is externally developed and has no designated successor to maintain the code base. I know the user base is large enough there is hope the community could generate support, but it's a consideration IMO.

Andrea-Havron-NOAA commented 2 months ago

@jane Sullivan - NOAA Federal @.***> R is a untyped programming language. In a untyped language, a type for a variable can change at any time. For example, a variable defined as a list in R can easily be redefined as any other type, such as a vector, matrix, S4 object, etc and no warning will be given. I believe this is the kind of unintended bug

@msupernaw my point is that this is an R issue and not an RTMB issue, at least not to my knowledge. RTMB does the typing for the user ("typing" - is that how you say it?). Specifically, before we identify this as a "problem" in the "why not RTMB" column, I think we should have at least one example of it actually being a problem in RTMB.

The closest thing I know of is that RTMB does not currently accept sparse matrix classes, so you have to as.vector() them inside a likelihood (example from Sean Anderson). This is a different issue though.

Thanks for raising this point Jane. RTMB, though, is just an R interface for TMB. RTMB specific functions are written in C++ and exposed to R via Rcpp. The objective function gets coded up in R. Any non-RTMB specific function (for example in FIMS, the population loop and code written to calculate expected/derived values) would be written in R.

Here is an example from RTMB's mvrw.R where I introduced an error by dropping the indices on the sd vector when calculating the covariance matrix.

Correct code:

cov <- outer(1:stateDim,
                 1:stateDim,
                 function(i,j) rho^(abs(i-j)) * sds[i] * sds[j])
>  cov 
       [,1]   [,2] [,3]
[1,] 0.2500 0.5625 0.81
[2,] 0.5625 1.5625 2.25
[3,] 0.8100 2.2500 4.00

Code with missing indices:

cov <- outer(1:stateDim,
                 1:stateDim,
                 function(i,j) rho^(abs(i-j)) * sds * sds)
> cov
        [,1]   [,2]    [,3]
[1,] 0.25000 0.2250 0.20250
[2,] 1.40625 1.5625 1.40625
[3,] 3.24000 3.6000 4.00000

I compiled the TMB equivalent example, mvrw.cpp and it failed with 'invalid cast from type' error due to trying to multiply a scalar times a vector,

nathanvaughan-NOAA commented 2 months ago

@Andrea-Havron-NOAA and @JaneSullivan-NOAA. That is a good example of how R can do wild things when it makes incorrect assumptions. Are there any important cases where this could happen with RTMB due to a user error that wouldn't happen if we use C++ base code? I assume that we will just need to ensure this kind of error doesn't occur in developer written FIMS code through tests and case studies. I also understand @ChristineStawitz-NOAA's desire for portability due to unknown future support for TMB but we should also consider the potential for future NOAA support of FIMS if we limit accessibility to only/largely C++ which has already resulted in significant issues within our development team. Coding FIMS in all R would substantially increase current and future participation in development within both our development team and the rest of NOAA. Using SS as an example and almost 100% reliance on @Rick-Methot-NOAA for updates/support it seems that future FIMS development support may be much more precarious/limited than TMB support given the more limited user base? I also agree with @kellijohnson-NOAA that we should not consider porting over to RTMB as a negative at this point given the significant refactoring and ongoing future development that is already planned. That may have been a little all over the place so in summary.

  1. Would moving to RTMB add new user-induced bugs that cannot be caught in our code, but would be with C++? If yes then that is a significant concern as silent errors that give plausible but wrong results are a huge liability in production assessments particularly during on the fly revisions that are often requested in the SE and I assume elsewhere.

  2. If no to 1 then this just comes down to a tradeoff between community/developer accessibility vs future portability with the additional issue that FIMS would only remain portable as long as there was developer support to do the porting. We don't want to end up in an SS type situation where only a small group of people are golden handcuffed to a lifetime of development support.

JaneSullivan-NOAA commented 2 months ago

@Andrea-Havron-NOAA great example! Thank you :) I'll just go shut up now.

Andrea-Havron-NOAA commented 2 months ago

Thanks everyone for all the great feedback. I think any platform we develop needs to be somewhat independent of TMB. I see two main potential future use cases that would not depend on TMB:

JonBrodziak commented 2 months ago

In future assessment modeling situations, Brieman’s two cultures of data modeling and algorithmic modeling may both need to be considered to produce the best predictive accuracy (Breiman 2001). This contrasts with the long-term fisheries assessment modeling practice of deploying data models with parameter sets to be estimated in contrast to machine learning algorithms to make probabilistic classifications of stock status for management action. Optimization methods for assessments formulated as data models typically use gradient descent and assume differentiable objective functions. Other applicable but less common optimization approaches include stochastic search algorithms based on Monte Carlo simulation or generic hill-climbing algorithm based on simulated annealing or particle swarm optimizers for frequentist or Markov Chain Monte Carlo or Hamiltonian Markov Chain algorithms for Bayesian sampling from the target posterior distribution. In comparison, there are few instances of algorithmic or machine learning models that have been applied in marine science to date (Malde et al. 2020). However, more frequent future applications of machine learning for assessment purposes can be expected (Beyan and Brouwer 2020, Sartar et al. 2021) based on predictive accuracies being obtained for species identification (e.g., Smolinski et al. 2019, Allken et al. 2021), forecasting environmental conditions (e.g., Ye et al. 2021) forecasting recruitment or spawning biomass (Smolinksi 2018, Lüdtke and Pierce 2023) or setting prior distributions for probable stock size (Froese et al. 2023). While there is no certainty about the future needs for stock assessment modeling, an important future challenge is to ensure that any next-generation assessment platform be adaptive to potential changes from a data model-based to a more algorithmic modeling culture.

Cole-Monnahan-NOAA commented 2 months ago

@JonBrodziak can you give a little more context here? Are you saying that we should build FIMS to do traditional stats and ML equally well? How does this interact with RTMB vs C++?

FYI the key paper cited is here.

ChristineStawitz-NOAA commented 2 months ago

Happy to have this thread continue our discussion, but if you do have a specific pro or con or suggested action I would like to capture them in this doc.

JonBrodziak commented 2 months ago

Cole, Thanks for posting the link to Brieman (2001). Here is the context.

FIMS is a general modeling system to provide BSIA and decision support for fisheries management. FIMS has been envisioned and set up a priori to build integrated assessment models. My point is that FIMS software infrastructure should be designed to be extensible enough to readily integrate ML-based decision support tools because they are inevitable.

Aloha, Jon

On Wed, Mar 13, 2024 at 1:08 PM Cole Monnahan @.***> wrote:

@JonBrodziak https://github.com/JonBrodziak can you give a little more context here? Are you saying that we should build FIMS to do traditional stats and ML equally well? How does this interact with RTMB vs C++?

FYI the key paper cited is here https://doi.org/10.1214/ss/1009213726.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-FIMS/FIMS/issues/554#issuecomment-1996057891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVXIZV6VHUEJUWQXYVLNB3YYDL6HAVCNFSM6AAAAABERGZOI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGA2TOOBZGE . You are receiving this because you were mentioned.Message ID: @.***>

-- Jon Brodziak, Ph.D. NOAA Inouye Regional Center Pacific Islands Fisheries Science Center 1845 Wasp Boulevard, Building 176, NMFS/PIFSC/FRMD Mail Room 2247 Honolulu, Hawaii 96818 USA PIFSC Stock Asessment Program https://pifscstockassessments.github.io/ Phone: 808-725-5617 Email: @.***

“Wherever my travels may lead, paradise is where I am.” ~ Voltaire

The views expressed in this message are my own and do not necessarily reflect any position of NOAA.

ChristineStawitz-NOAA commented 1 month ago

@JaneSullivan-NOAA is going to start prototyping an age-length structured approach in RTMB which will help inform this decision

JaneSullivan-NOAA commented 1 month ago

@JaneSullivan-NOAA is going to start prototyping an age-length structured approach in RTMB which will help inform this decision @NOAA-FIMS/fims Nathan and I have some meeting times set May 13-17 on the FIMS calendar if anyone else wants to join.