Simplify `fit_seromodel()` output

Bisaloo commented 1 year ago

Currently, the output of fit_seromodel() is quite large and contains some information that could easily be removed.

This would allow us to:

Use simpler objects than lists for output. Possibly even an existing S3 class with useful methods. This should help users find the information that they need in the output and manipulate the output more easily.
Use less storage space when saving the output of this function (discussed briefly in https://github.com/epiverse-trace/serofoi/issues/77#issuecomment-1603428324)

Bisaloo commented 1 year ago

We have discussed this today and this is where we landed. Below is the current output of fit_seromodel() and how we want to deal with each element:

n_iters, n_thin, foi_model, delta, m_treed are function input and should not be included in the output. We should instead teach users to always keep the script producing the object alongside the object itself.
n_warmup, exposure_years, exposure_ages, stan_data can be directly computer from inputs with a single line of code each. If we believe users might want to access this data (e.g., for debugging), we should instead create an intermediate exported function to compute them but it shouldn't be part of fit_seromodel() output by default
loo_fit, foi_cent_est, foi_post_s can be directly computed from fit with a single line of code each. Users can either compute them by themselves or they can be included in downstream post-processing functions that act on the output of fit_seromodel()

This leaves us with fit, which is the output of rstan::sampling() and is an object of class stanfit. The seems like a good candidate for output because it means we benefit from stanfit methods out of the box. If we want to provide custom methods, it is also possible to create a subclass inheriting from stanfit.

ben18785 commented 1 year ago

I agree -- returning a stanfit object makes most sense since Stan is a well-used and well-developed package that many people have experience working with.

jpavlich commented 1 year ago

I just merged @ntorresd 's PR #109 in which the object returned by fit_seromodel() is simpler. @ntorresd 's rationale of not returning a plain stanfit object is to reduce the probability that the user provides the wrong serodata to other functions that require both serodata and their corresponding stanfit object (a serodata that was not the input to the generated stanfit object)

@Bisaloo @ben18785 do you agree with the above argument? or would you delegate into the package's user the responsibility to ensure the correct parameters are passed to other functions that use both serodata and the stanfit object.

I do no heavily lean towards any option, so I would like your feedback.

Bisaloo commented 1 year ago

would you delegate into the package's user the responsibility to ensure the correct parameters are passed to other functions that use both serodata and the stanfit object.

I think it's the user responsibility. It's important to teach users to use reproducible scripts, and not carry their data from one session to the other, losing the object provenance. Code workarounds to try and handle this to the user add complexity to the code, thus reducing maintainability, and paradoxically makes the outputs more difficult to manipulate. And even with this, you can't completely protect the user from losing the object provenance. Hence why I believe that teaching good practices is the only way to solve this issue.

If we had a plain stanfit object, users would be able to directly use already implemented methods (plot(), summary()). Having a more complex object as in #109 will require the users to do plot(x$stanfit) or force you to implement your own methods.

epiverse-trace / serofoi

Simplify `fit_seromodel()` output #89