Closed ntorresd closed 2 months ago
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.)
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.)
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.)
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.)
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.)
Hi @ntorresd,
This is so great. Thank you so so much for this work -- it lays the foundation for a whole lot more work for us all and does so in a really elegant way. (Well done team! Here's to Bogota 2025...)
A few thoughts. These are generally minor and could be discussed and handled in separate PRs if this isn't the right forum:
tsur
-> date_survey
? I can imagine how we might, in the future, use surveys taken at different parts of the year (when looking for seasonal effects). This generality might prove useful. At the least, I would suggest we rename this variable to use full proper names.sample_size
-> n_sample
. We consistently use n_x
to mean a count of x
throughout the package, and I'd suggest we do so here.foi_index
as used in the fit_seromodel
function seems a little unsafe to me. I also wasn't sure what this means from the function documentation, "Integer vector specifying the age-groups for which force-of-infection values will be estimated. It can be specified by means of [get_foi_index]". To illustrate my confusion, I wasn't sure in your example in the above whether the first element of foi_index
corresponded to the oldest-aged individuals or the youngest. How about adding structure to this input? e.g. a data frame with (year, foi_index) for time models and (age, foi_index) for age models?@export
this to users summarise_loo_estimate
? Seems easily doable using just the loo::loo
functionality.covr::package_coverage()
)Hi @ben18785, thanks a lot for your feedback!
I'll address some of your points in this PR and open issues to address the others, as there are some closely related to user's feedback collected during the previous user test:
tsur
->date_survey
? I can imagine how we might, in the future, use surveys taken at different parts of the year (when looking for seasonal effects). This generality might prove useful. At the least, I would suggest we rename this variable to use full proper names.
Right now, this variable is named survey_year
. For the time being I'll leave it as it is, as we don't plan to implement seasonal models in the near future. We can change this easily once we have reached that point (if we decide to implement it in serofoi and not elsewhere anyway).
sample_size
->n_sample
. We consistently usen_x
to mean a count ofx
throughout the package, and I'd suggest we do so here.
I'll implement this one before merging for the sake of names consistency.
I love the seroreversion model example you've given in this PR. Could it form the basis of a package article about fitting models with seroreversion?
I've opened #205 to address this.
I love the example you have about reducing the numbers of parameters being estimated and again I think these should go into one of the package articles.
I think we should address this one in #204 , opened by one of our users @JDConejeros during the user test; it's a good chance to give a compelling explanation of this concept.
With the plotting, is there a way to suppress the written text at the top? I can imagine people wanting to put these into a publication and they may not want the written text.
I added a comment in #202 about this, so we can discuss this further over there.
foi_index
foi_index
as used in thefit_seromodel
function seems a little unsafe to me. I also wasn't sure what this means from the function documentation, "Integer vector specifying the age-groups for which force-of-infection values will be estimated. It can be specified by means of [get_foi_index]". To illustrate my confusion, I wasn't sure in your example in the above whether the first element offoi_index
corresponded to the oldest-aged individuals or the youngest. How about adding structure to this input? e.g. a data frame with (year, foi_index) for time models and (age, foi_index) for age models?
Yes, this is a must. I opened #206 to address this.
summarise_loo_estimate
- Minor thing but do we want to
@export
this to userssummarise_loo_estimate
? Seems easily doable using just theloo::loo
functionality.
I'd rather keep for the time being. During the user test it was useful to have it exported.
[...] * Relatedly, what would be our unit testing coverage for this PR if merged (i.e.
covr::package_coverage()
)
I haven't run it yet, but it should be very low as the only part of the code covered by unit testing right now is the data simulation module. We've discussed unit testing with @jpavlich and we decided to start over as the structure of the packaged has changed so much. After merging this PR to dev, I'll open some issues about this. The idea is to have >90% coverage before submitting to CRAN.
- Silly question but I couldn't find this on a quick look: do we unit-test the custom functions we use in Stan?
Not a silly question at all. As mentioned before, we are not testing for any of this right now. We should implement this before merging to main, so this is something we will discuss soon enough.
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.)
This PR closes:
This PR replace the old modelling and visualisation functions for refactored versions agreed on by the development team of the package (@zmcucunuba, @ben18785, @sumalibajaj, @jpavlich, @ekamau) earlier this year.
We designed this new functions for the selection and the specification of the Bayesian models to be more flexible than before, as well as including additional constant/time/age varying models with the possibility to estimate the seroreversion rate $\mu$. In particular, the usage of
fit_seromodel()
went from (e.g.):to
Note that we refer to the serological survey now as
serosurvey
rather thanserodata
, and we restrict it to have the following data:tsur
: Year in which the survey took placeage_min
: Floor value of the average between age_min and age_maxage_max
: The size of the samplesample_size
: Number of samples for each age groupn_seropositive
: Number of positive samples for each age groupThe function
sf_normal
is one of a set of auxiliary functions designed to specify the prior distributions of the parameters to be estimated (FOI or seroreversion rate), as suggested in #193 . Currently available priors aresf_normal()
andsf_uniform()
, corresponding to Gaussian and uniform priors respectively.To illustrate the current pipeline, consider a disease with constant FOI $\lambda = 0.02$ and seroreversion rate $\mu=0.01$:
and a serological survey with the following features:
We can use
simulate_serosurvey()
, introduced in #199, to simulate statistically consistent seropositive countsn_seropositive
for each age group, according to our set of serological models. In this case, I use the time-varying model:which we can visualise using
plot_serosurvey
:Implementing the constant model with and without seroreversion by means of
fit_seromodel()
:Visualizing the results:
Note that the model estimating the seroreversion rate (right panel in the image), no only is a better fit for this serological survey according to the elpd value, but it also accurately estimates both the FOI and seroreversion rate we used to simulate the serosurvey.
Another difference with previous versions lies in how we visualize the estimated parameters. Since the constant model is estimating a single FOI value, we only show the estimated value with its corresponding credible interval (we use to plot it instead). However, for the time and age varying models we estimate several values of the FOI in the time/age span of the survey, which we visualize graphically. Take for instance the time varying model with seroreversion:
Here we plot the estimated values of the FOI as a function of time (blue line and shadow) and the R-hat estimates for each estimated value (black dots). Note that, even though we ran the model for a large number of iterations - 4000, the model did not converged (since there are R-hat values over 1.01). We can improve the convergence of the model by estimating less values of the FOI. To specify the time intervals for which FOI values will be estimated, we index them by means of
get_foi_index()
:This function returns a list of indexes that labels each year (or age, for age-varying models). The idea is that a single FOI value will be estimated for each index, meaning that 10 FOI values will be estimated in this particular case:
Now the model converges. As long as
foi_index
length covers the whole time/age span of the serological survey, less regular indexationsfoi_index
can be specified.