NOAA-FIMS / FIMS

The repository for development of FIMS
https://noaa-fims.github.io/FIMS/
GNU General Public License v3.0
13 stars 8 forks source link

Create R functionality for passing user input to FIMS #307

Closed k-doering-NOAA closed 9 months ago

k-doering-NOAA commented 1 year ago

Thanks @kellijohnson-NOAA for leading this group and putting together this list of tasks!

We will be in charge of creating extractor functions to go from a FIMS data object to lots of individual objects.

Why not make the user do this like what is done in Stock Synthesis where you have data parts like age composition, length composition, and indices? Because, I had this vision that users would not need to know where their data should go but just have to worry about getting it all together. That way, if users want to run their model with a subset of fleets they can easily do

dplyr::filter(my_data, fleet %in% 3:4) %>% run_FIMS()

and their model would work.

Next Steps

Note: more current tasks are listed at the top of the running notes

Relevant branches

k-doering-NOAA commented 1 year ago

Adding the meeting notes: https://docs.google.com/document/d/1tia17hy7xu1qWKXOEYfEp6WiHu9RnGs2Lq_PR3KNqBg/edit#heading=h.nxx17q9bhg4o

Bai-Li-NOAA commented 1 year ago

Understand S4 classes task: @ChristineStawitz-NOAA mentioned that it would be good to check whether the Rcpp::class is exposed to R as an S3 or S4 object.

sloop::otype(age_frame) #> [1] "S4" base::isS4(age_frame) #> [1] TRUE base::mode(age_frame) #> [1] "S4" base::inherits(age_frame, "envRefClass") #> [1] FALSE

- FIMS Rcpp objects checks show contradictory results
  - The object from calling `fims <- Rcpp::Module("fims", PACKAGE = "FIMS")` is a S4 object.
  - The object from calling fims submodules is a reference class according to `sloop::otype()` and `base::inherits()`, but S4 object according to `base::isS4()` and `base::mode()`. Also, based on the code example below, both recruitment and recruitment$steep are reference classes.

```r
fims <- Rcpp::Module("fims", PACKAGE = "FIMS")

sloop::otype(fims) #> [1] "S4"
base::isS4(fims) #> [1] TRUE
base::mode(fims) #> [1] "S4"
base::inherits(fims, "envRefClass") #> [1] FALSE

recruitment <- new(fims$BevertonHoltRecruitment)
recruitment$steep$value <- 0.75

sloop::otype(recruitment) #> [1] "RC"
base::isS4(recruitment) #> [1] TRUE
base::mode(recruitment) #> [1] "S4"
base::inherits(recruitment, "envRefClass") #> [1] TRUE

sloop::otype(recruitment$steep) #> [1] "RC"
base::isS4(recruitment$steep) #> [1] TRUE
base::mode(recruitment$steep) #> [1] "S4"
base::inherits(recruitment$steep, "envRefClass") #> [1] TRUE

sloop::otype(recruitment$steep$value) #> [1] "base"
base::isS4(recruitment$steep$value) #> [1] FALSE
base::mode(recruitment$steep$value) #> [1] "numeric"
base::inherits(recruitment$steep$value, "envRefClass") #> [1] FALSE

According to Eddelbuettel and François (2017), I think we are using Rcpp modules for exposing C++ classes based on the reference classes introduced in R 2.12.0. If we want to change the Rcpp reference classes to S4 classes later, this example may be helpful. If we want to change the S4 data object to reference class later, the examples here may be helpful. I agree that we can just use S4 for data objects since we started with S4 and it appears that it will work with C++ and TMB.

Additional notes from Kelli

S4 class allows for internal error checking and strict rules with respect to R objects that you do not get without using a class system and it allows for methods to be recycled across classes. For example, if we have a class for age-structured models in FIMS called FIMSA, we can define print.FIMSA() that will allow for users to do the following:

output <- run_FIMS() # or whatever the equivalent to lme4() would be
print(output)
> very pretty output
str(output)
> very complicated list of list structure

Additional notes from @msupernaw

Here is a good comparison of S3, S4, and Reference class

Andrea-Havron-NOAA commented 1 year ago

Potential example R package: unmarked

nathanvaughan-NOAA commented 1 year ago

I just pushed a new branch 307-extract-frame-size with some initial code to pull nyears, ages, and fleets from the data frame. Next step is to determine the dimensions of all the input data objects needed by fims so we can build empty NA S4 objects and then code to pull all the relevant data in to replace NA's where data exists.

k-doering-NOAA commented 1 year ago

Task:

Move weight-at-age data from its own entry in the FIMSFrameAge object to be part of the main data (@k-doering-NOAA)

I took a look and the weight at age is already included in the data slot of the FIMSFrameAge object. There is just an additional slot for weight at age.

In R/fimsframe.R , we have code that defines 2 object types: FIMSFrame and FIMSFrameAge.

I wonder how we want to work with these going forward? Do we want to keep modifying FIMSFrameAge and just leave FIMSFrame as it is (has just one data slot called data with the data fraome)? Note that FIMSFrameAge build upon FIMSFrame - it has the slots FIMSFrame does and then adds more.

Should I remove the weight at age data slot, from FIMSFrameAge?

k-doering-NOAA commented 1 year ago

@nathanvaughan-NOAA , I took a look at your branch! I think we can make nyears, ages, and fleets into new slots, either in FIMSFrame or in FIMSFrameAge depending on where we decide to work. I'm happy to work with you on this, as looking at this code refreshed my memory on where things go to set up the S4 classes!

kellijohnson-NOAA commented 1 year ago

@k-doering-NOAA I see here that weight-at-age object is a filter of the input data, which is 🥳 EXACTLY🥳 what we want. So, I think the next step is fixing the weight-at-age data that is pulled out to be a vector rather than a matrix, which will mean that some other things that are based on FIMSFrameAge will break if they are using this weight-at-age data frame.

kellijohnson-NOAA commented 1 year ago

@k-doering-NOAA regarding FIMSFrame and FIMSFrameAge I think that we want to keep this structure because we are currently not dealing with length data and in the future FIMSFrameLength could be used for a length only model and this small additional structure does not really add that much complication. I could be convinced otherwise though.

k-doering-NOAA commented 1 year ago

@kellijohnson-NOAA Thanks! Nathan and I worked together to add a few new slots in FIMSFrame (nyears, ages, and fleets), still working in the 307-extract-frame-size branch

I think we can refactor to put any slots that deal with age-based models to FIMSFrame instead.

And OK, I can work on changing so that the weight at age data is a vector instead.

k-doering-NOAA commented 1 year ago

I just refactored so that slots that deal only with age-based models are in FIMSFrameAge. So, to create the object:

data(package = "FIMS")
age_frame <- FIMSFrameAge(data_mile1)

I also added accessor "getter" functions for all slots except data, so you can do

ages(age_frame)

instead of

age_frame@ages

which is supposed to be reserved for developers.

I did not add "setter" functions because I don't know if we want people doing

ages(age_frame) <- 11
nathanvaughan-NOAA commented 1 year ago

Hey @k-doering-NOAA I was looking into the generic fleets issue we discussed and it doesn't look like S4 classes are set up to do that well on their own. I was then thinking that maybe we put all of the FIMSFrame code inside a huge wrapper function something like BuildFimsFrames(data,params) and then within that function we can first process the data to see how many Fleets/Ages/etc are needed and then use a big eval(parse()) wrapper to adaptively build the FIMSFrame setClass() and other required functions with the necessary number of slots? That should keep things neat and make everything expandable to generic situations. I'm thinking we could then define slot names such as fleet_1_catch, fleet_2_index, etc so that every data vector gets its own slot which can then be passed to FIMS. I think it's probably also useful for our group to discuss a similar object for the parameters as these are all wrapped up with the data and it's probably easier to process them both at once.

Andrea-Havron-NOAA commented 1 year ago

@k-doering-NOAA I see here that weight-at-age object is a filter of the input data, which is 🥳 EXACTLY🥳 what we want. So, I think the next step is fixing the weight-at-age data that is pulled out to be a vector rather than a matrix, which will mean that some other things that are based on FIMSFrameAge will break if they are using this weight-at-age data frame.

I think the C++ Data Object does the conversion from matrix to vector. In this test case, we only need to pass in the comp data directly from the data frame without converting.

@ChristineStawitz-NOAA, am I correct that the matrix can be passed into the Rcpp directly without converting to a vector or does the input/output group need to work on this conversion?

ChristineStawitz-NOAA commented 1 year ago

You are correct @Andrea-Havron-NOAA with one minor tweak, which is that right now index and age comp data are treated as Data Objects but EWAA data is not. EWAA data is treated differently within the EWAA module.

We could refactor this if needed!

Andrea-Havron-NOAA commented 1 year ago

You are correct @Andrea-Havron-NOAA with one minor tweak, which is that right now index and age comp data are treated as Data Objects but EWAA data is not. EWAA data is treated differently within the EWAA module.

We could refactor this if needed!

Thanks for the clarification @ChristineStawitz-NOAA!

ChristineStawitz-NOAA commented 1 year ago

I just opened a draft pull request for the population-Rcpp-interface branch. I kept it as draft because I think we want to merge in BL-population-Rcpp-interface to that branch first before we merge both into main (can you confirm @Bai-Li-NOAA )?

Bai-Li-NOAA commented 1 year ago

I just opened a draft pull request for the population-Rcpp-interface branch. I kept it as draft because I think we want to merge in BL-population-Rcpp-interface to that branch first before we merge both into main (can you confirm @Bai-Li-NOAA )?

Hi @ChristineStawitz-NOAA , thanks for the reminder! I just opened a pull request #327 to merge the changes from BL-population-Rcpp-interface to population-Rcpp-interface. There are no conflicting files and all checks have passed.

Bai-Li-NOAA commented 1 year ago

@kellijohnson-NOAA and @nathanvaughan-NOAA, I just opened a pull request #329 to merge more R tests to 307-extract-frame-size branch. Let me know if you have any suggestions. A couple of minor question after writing more tests (I think we can address the question later when developing run_fims_age() ):

nathanvaughan-NOAA commented 1 year ago

Excellent @Bai-Li-NOAA I just saw that in my email. Unless I'm mistaken, the fleet object in FIMS does not distinguish between fishing fleets and surveys (i.e. a survey is just a fishing fleet with zero catch and in the future, a fishing fleet may have catch but no survey data though at the moment all fleets need everything) so I think all our methods should be built to operate similarly. I agree that I was told the m_weightatage and m_ages generic methods were just meaning modules. I see now that @kellijohnson-NOAA modified the pull(value) to summarize a mean(value) over I think all years for each age. I could be wrong though as I don't use dplyr/tidyverse much in my stoneage base R world. Was that intentional @kellijohnson-NOAA?

kellijohnson-NOAA commented 1 year ago

@Bai-Li-NOAA

  1. fishing and survey fleets should be the exact same, just as @nathanvaughan-NOAA suggested
  2. m_*() for sure means module; it just so happens that m_weightatage() calculates median weight-at-age. Eventually, when we can pass weight-at-age by fleet rather than forcing it to be a single vector the same length as the number of ages as it is now we can refactor m_weightatage() to pass a long vector rather than this short vector. I just thought using mean weight-at-age data was better than weight-at-age data from the first year as I had it before.
nathanvaughan-NOAA commented 1 year ago

Thanks @kellijohnson-NOAA that makes sense. I keep forgetting that weight at age is the empirical weight at age vector and confounding it with the abundance at age comp data for some reason.

k-doering-NOAA commented 1 year ago

I just learned about the hardhat package that helps developers follow tidymodel conventions. Not sure if it makes sense for FIMS, but maybe it'll provide at least some inspiration!

ChristineStawitz-NOAA commented 1 year ago

I'm doing some branch clean up. Should we leave this open and rebase it once the estimation test is in main? or merge it into main?

nathanvaughan-NOAA commented 1 year ago

I would say we leave it open and rebase.

kellijohnson-NOAA commented 1 year ago

307-data-validate has valid content and we can rebase it. I deleted 307-run.

Andrea-Havron-NOAA commented 1 year ago

Bumping this issue as we will be merging the test branch into main soon.

k-doering-NOAA commented 1 year ago

Today Bai, Nathan, Andrea, and I worked on modifying the FIMS demo vignette to access age composition from FIMSFrameAge rather than hard coding the values. Work was done in branch 307-add_age_comp. We discovered:

We will finish up on Friday and potentially discuss some bigger-picture ideas about how to build out the S4 system.

ChristineStawitz-NOAA commented 11 months ago

Hey all - looking into closing MQ and trying to determine if we will do additional work on this issue during MQ or if it should be moved to M2?

  1. Can I merge in or delete https://github.com/NOAA-FIMS/FIMS/tree/307-data-validate ?
  2. If we decide to close out - should we copy the undone items from the initial issue into an M2 issue?

Thanks!

ChristineStawitz-NOAA commented 10 months ago

Just wanted to follow up on the question above @k-doering-NOAA @kellijohnson-NOAA @Bai-Li-NOAA @MOshima-PIFSC @nathanvaughan-NOAA - I know Kathryn and I discussed it a while ago but I can't find a resolution in my records. Let me know if this needs some time during a seaside chat. Thanks!

k-doering-NOAA commented 10 months ago

I think @kellijohnson-NOAA was going to take a look at https://github.com/NOAA-FIMS/FIMS/tree/307-data-validate

I was ok with just deleting the branch, but if Kelli wants to merge it in instead, that works, too.

kellijohnson-NOAA commented 10 months ago

There is one remaining task from this issue

fill in NA values for missing years (if we add NA handling inside FIMS or mean/default
values with huge sd's if we leave the NA handling for later)?

but I think it should be moved to a new issue because I am not sure where the code base is at with respect to handling NA values and missing data is not currently a problem.

k-doering-NOAA commented 10 months ago

I just opened #502, so I think this issue can be closed once the PR is merged in.

ChristineStawitz-NOAA commented 10 months ago

Thanks @kellijohnson-NOAA - I agree we haven't done anything to deal with NAs on the C++ side yet so that's gotta be an M2 issue. Thanks @k-doering-NOAA for opening one!

k-doering-NOAA commented 10 months ago

@ChristineStawitz-NOAA shouldn't this be closed? Maybe we should continue M2 work in separate issue?