James-Thorson-NOAA / VAST

Spatio-temporal analysis of univariate or multivariate data, e.g., standardizing data for multiple species or stages
http://www.FishStats.org
GNU General Public License v3.0
124 stars 54 forks source link

S3 objects for outputs #167

Closed James-Thorson closed 5 years ago

James-Thorson commented 5 years ago

I would like to start bundling outputs for for low-level and high-level functions as S3 objects, so that there's simple user functions for summarizing output. My current list of user-functions could be:

I'll add to this list while brainstorming. I imagine that these functions will be defined to work for a few different S3 objects, i.e., where x is the output from fit_model, also summary could also be defined to work using the output from make_extrapolation_info and make_spatial_info

colemonnahan commented 5 years ago

I think this is a good idea and is something I've been thinking about a lot. I would suggest two S3 classes. The first being vastdata which is everything that goes into MakeADFun. The user could pass this to plotting functions for a more thoroughly and easy exploration of the data. Then coming out is something like vastfit which does what you state above, plus other new things like a diagnostic check, printing key settings that should be reported in a paper, etc.

One important capability I'd like to suggest is that there should be a simple way to compare different model fits. So you run a model with two different likelihoods, or different numbers of factors or knots, bias correction on and off, etc., it should be easy to compare key output like indices, spatial residuals, tables of fixed effects, etc. Similar to what r4ss does. I don't know whether it'd be better to merge the class objects together and then pass to a plotting function, or have plotting functions take multiple class objects as arguments. Something to think about, and I think it'd be very useful on the user side to have this capability.

I also would suggest that these changes be bundled into other big changes (e.g., moving functions around, high-level wrappers, renaming, code cleanup, dropping backward compatability, etc) and essentially making VAST2. That'd be a nice clean break and it'd be easier for us users to have one big wave of changes than have them rolled out incrementally.

Happy to chat in person with @James-Thorson-NOAA @cstawitz . We're also brainstorming ideas from the user side in our VAST reading group, so input from that group could also be relevant for this topic.

James-Thorson-NOAA commented 5 years ago

Thanks Cole!

Also FWIW, I started a separate thread on spatial objects, #171, and anyone feel free to chime in there.

ericward-noaa commented 5 years ago

Just to keep this in the thread, we'd also talked about including summary functions like

predict( x, newdata ) -- return predictions for new dataframe AIC( x ) -- return AIC of fitted model

I like the way Ben Bolker's set up glmmTMB - might be good to use something like that as a template in deciding which functions are exported v internal, etc.

For the multivariate models, the formula response is a bit challenging. A couple options:

  1. Include a list of formulas (one for each species for example) -- several multivariate packages on CRAN take this approach
  2. do something like MARSS where the user has to create the design matrices by hand (I can't really recommend this option though!)
colemonnahan commented 5 years ago

I think that the most important thing is to fully flush out what exactly you want users to be able to do, and conversely things they should not do. The whole point of OOP is to hide away the internals from the user and only allow them to access to what they should be doing.

I think the complicated thing for VAST is that it's used for a variety of things, and each use will require different user functionality. I've only ever used it for index standardization so there's a whole suite of options I don't even know about. So I suggest maybe taking your Table 1 from this paper and writing down all the functionality you need to accomplish for each of those sub-analyses, plus those proposed as future research.

The things that are onerous (at least IMO) on the user end would be things like, fitting the model then pulling out pieces, massaging it, putting back in as input and refitting the model. Steps like that require the user to know the internal way things are stored, what they mean, etc.

ChristineStawitz-NOAA commented 5 years ago

Going off of @colemonnahan it sounds like print() and plot() could be useful, particularly for the data-type objects.

Looking at the lists that get passed around between the key VAST and FishStatsUtils functions, it seems like good candidates for inputs to object-ize might be the objects you have as lists currently (PolygonList, GridList) as well as Network_sz . Personally I find it onerous to try to navigate the list of slots in each of those lists. This would mean defining methods for common matrix/array operators dim() etc. as in Hadley's book. I am not super familiar with the way R does OOP so the Best Practices on that page seemed helpful in terms of how to best implement constructor and check methods for S3 class types.

Methods like AIC, anova, summary, and coef seem standard for model-output type objects.

colemonnahan commented 5 years ago

I am thinking more broadly. I think those are a start but really people need to dig in to the objects when doing things beyond the canned outputs. For instance if you want to take output from 3 models and put them onto a single fancy figure for a paper. That's when you find yourself digging into the lists and confronting *_gcyl types of stuff.

This is what I mean about trying to write down everything users would reasonably want to do. That helps define the functions you need which in turn helps define the appropriate structure for the classes.

I also think a lot of those returned things would never be needed and thus could/should be dropped. Like the M matrices which are in the Report I think.

jimianelli commented 5 years ago

would it be worthwhile to look at the approach glmmtmb has taken? Seems they have a level of generality that could be useful to draw experience from. Looks they are using class 3 objects.

On Mon, Jul 1, 2019 at 3:05 PM Cole Monnahan notifications@github.com wrote:

I am thinking more broadly. I think those are a start but really people need to dig in to the objects when doing things beyond the canned outputs. For instance if you want to take output from 3 models and put them onto a single fancy figure for a paper. That's when you find yourself digging into the lists and confronting *_gcyl types of stuff.

This is what I mean about trying to write down everything users would reasonably want to do. That helps define the functions you need which in turn helps define the appropriate structure for the classes.

I also think a lot of those returned things would never be needed and thus could/should be dropped. Like the M matrices which are in the Report I think.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/James-Thorson-NOAA/VAST/issues/167?email_source=notifications&email_token=AAUW7YWPA2HUVX2AZU7DULTP5J5TRA5CNFSM4HRFTLEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY7PNHA#issuecomment-507442844, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUW7YUI53R3IO5JHI6SOKTP5J5TRANCNFSM4HRFTLEA .

-- Jim Ianelli

James-Thorson-NOAA commented 5 years ago

Thanks everyone for your input, and particularly to the VAST steering committee (Christine Stawitz, Jim Ianelli, and Eric Ward)! I have worked to follow conventions used by TMB and glmmTMB (e.g., that the S3 object has the same name as the function generating it)

As of now, I have added the following S3 objects with associated methods:

I will likely keep working on this slowly, and currently intend to add:

and I plan to update changes here and also list all current methods in the User Manual.

Finally, I realize that this is not the thorough process of documenting analyses and developing a high-level function to implement them as you suggested @colemonnahan, I'd be happy to work with people to find funding to do a fuller job in the future.

PS: these changes are currently available in the development branch of FishStatsUtils and VAST but will obviously be merged to main branch upon next numbered release