epiverse-trace / serofoi

Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies
https://epiverse-trace.github.io/serofoi/
Other
17 stars 4 forks source link

Modelling workflow #61

Closed ben18785 closed 2 months ago

ben18785 commented 1 year ago

At the moment, a user does the following to fit their model:

  1. They prepare their data into the format required by the prepare_serodata function
  2. They run prepare_serodata to get data in a form required by the modelling (essentially some additional columns are added to the dataset)
  3. They run run_seromodel to fit the model.

I'd suggest that users don't really need / want to see prepare_serodata, so they'd pass the raw data direct to run_seromodel.

zmcucunuba commented 1 year ago

Hi Ben, I think prepare_serodata is helpful for the user to provide a step to think about the data, perhaps accompanied by warnings or errors indicating if there are any issues with the data before running the models.

ben18785 commented 1 year ago

Hi @zmcucunuba -- thanks. I agree that users certainly need warnings if their data aren't in the right form. But (sorry, playing Devil's advocate), couldn't that just be done when they do: run_seromodel?

Is there another reason a user would want to have the object returned by prepare_serodata?

zmcucunuba commented 1 year ago

Haha, I guess you're right @ben18785! Perhaps It's just me being extremely step-by-step-oriented.

ekamau commented 1 year ago

Just to add - what is the bare minimum information required for the different models to run, the minimum that should be supplied in the user input data? Then the functions like run_seromodel could indicate that ... in most instances, one needs at a minimum: age, years of survey, number_seropositive, number_tested. I could be wrong!

But I guess it also depends how much user interaction the workflow requires, or how complex the models are in which case the user gives more information..

ben18785 commented 1 year ago

@ntorresd is going to look at allowing the run_seromodel to include a step to optionally run the models without the preprocessing step.

ntorresd commented 2 months ago

Closed by #200. From v1.0.1 on, the only preprocessing needed for modelling is to add the age group marker age_group, which is built from age_min and age_max whenever it's missing in the survey. See the discussions in #191 and #193 for further details.