Serodata requirements - Githubissues

epiverse-trace / serofoi

Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies

https://epiverse-trace.github.io/serofoi/

Other

17 stars 4 forks source link

Serodata requirements #66

Closed ben18785 closed 6 months ago

ben18785 commented 1 year ago

Do we need the following columns to be supplied by a user in prepare_serodata to fit a serocatalytic model to them?

"survey"
"country"
"test"
"antibody"

If not, I'd suggest removing these requirements from this functionality to reduce demands on a user.

Additionally, the functionality does not explicitly detect whether the required columns are there: I'd suggest we add such functionality which allows these functions to fail "informatively" if the data do not match the required format.

zmcucunuba commented 1 year ago

Hi @ben18785. I agree the use of columns for meta-data info is not necessarily optimal. But perhaps is easy for a new user just to fill in columns. Some of these metadata are later used for visualisations. Again, as a user, I find them helpful when running hundreds of serosurveys simultaneously.

ben18785 commented 1 year ago

Thanks @zmcucunuba How about making them optional? If the data are there, it gets included in the plots, but the simulations run without these. I just think that reducing demands on users as much as possible (i.e. sticking to the essentials needed to fit a model) is generally a good idea.

ntorresd commented 1 year ago

Hi @ben18785 and @zmcucunuba . I think it'd be good for them to be optional indeed, although the package should provide warnings in case the information is missing for the user to be aware.

This is actually related with a development that might get implemented in the following weeks regarding the use of S3 classes to store and access the serosurveys internally in the package. Right now this information is given to the package as redundant data in the input dataframe. The idea is to reduce the memory load on the package by converting them into attributes common to the object (the serosurvey).

This way we could enable the possibility for the user to either include this information as columns in the input dataframe or as optional arguments in a function. This could be done either in prepare_serodata or in run_seromodel, which might be relevant for the discussion in #61 .

ben18785 commented 1 year ago

Thanks @ntorresd I've had mixed results using S3 classes in previous packages I've contributed to, and I think it worth considering whether it's necessary to make this big change to the underlying packages. To me, naively, the memory burden of a few extra variables probably isn't worth the hassle.

Re: warnings, I agree it might be worth warning the user if they have not supplied a survey attribute that the data are being treated as if they belong to a single serosurvey.

ntorresd commented 6 months ago

Fixed by #154