epiverse-trace / epiparameter

R package with library of epidemiological parameters for infectious diseases and functions and classes for working with parameters
https://epiverse-trace.github.io/epiparameter
Other
33 stars 11 forks source link

Replace {base} subset by dplyr::filter() to keep output consistent in README #111

Closed avallecam closed 7 months ago

avallecam commented 1 year ago

Please place an "x" in all the boxes that apply


Please include a brief description of the problem with a code example:

This example in the Quick start expects to get an output for influenza

https://github.com/epiverse-trace/epiparameter/blob/ed0da13a8802e364c1e35fff97323e319838a07b/README.Rmd#L57-L60

However, since the input dataset in epiparam() has changed, row number 12 now gets COVID-19 as output. This is inconsistent.

library(epiparameter)

eparams <- epiparam()

influenza_incubation <- as_epidist(eparams[12, ])
influenza_incubation
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Distribution: incubation period
#> Study: Linton et al. (2020) <10.3390/jcm9020538> PMID: 32079150
#> Distribution: lognormal
#> Parameters:
#>   mu: 1.45569556256012
#>   sigma: 0.554513029376191

Created on 2023-03-11 with reprex v2.0.2


Since the data input will keep changing, and we would like to have a consistent output in time I propose to use dplyr::filter() since the epiparam class object is a dataframe in the background.

library(epiparameter)
library(tidyverse)

eparams <- epiparam()

influenza_incubation <- 
  eparams %>% 
  filter(disease=="Influenza",
         epi_distribution == "incubation_period",
         prob_distribution=="gamma",
         author=="Ghani_etal") %>% 
  as_epidist()

influenza_incubation
#> Disease: Influenza
#> Pathogen: Influenza-A-H1N1Pdm
#> Epi Distribution: incubation period
#> Study: Ghani et al. (2009) <10.1371/currents.RRN1130> PMID: 20029668
#> Distribution: gamma
#> Parameters:
#>   shape: 17.503123698459
#>   rate: 8.5381091211995

Created on 2023-03-11 with reprex v2.0.2

I prefer using {dplyr} for this as it allows to filter multiple columns in one call, it improves the human-readability of code. Additionally, it shows integration with data cleaning packages from the {tidyverse}, which is another set of interoperable packages for this task.


avallecam commented 1 year ago

I just found that epidist_db() could also be an alternative for this issue.

library(epiparameter)
epidist_db(disease = "influenza", 
           epi_dist = "incubation_period",
           author = "Ghani_etal")
#> Disease: Influenza
#> Pathogen: Influenza-A-H1N1Pdm
#> Epi Distribution: incubation period
#> Study: Ghani et al. (2009) <10.1371/currents.RRN1130> PMID: 20029668
#> Distribution: gamma
#> Parameters:
#>   shape: 17.503123698459
#>   rate: 8.5381091211995

Created on 2023-03-11 with reprex v2.0.2

A key feature is that it bypasses calling to epiparam(). This could be a quick start option once the user already knows the database. But at a beginner phase, the user probably would prefer to explore the data.frame before taking shortcuts. For that reason, I agree to keep the current approach of epiparam() + as_epidist().

avallecam commented 8 months ago

@joshwlambert May this be an outdated issue? I currently agree with the current README display. Happy if you agree to close this at any time.

joshwlambert commented 7 months ago

@avallecam yes, this issue is now outdated and can be closed. I will still try and implement tabs in the vignettes to offer users base R and tidyverse options for the same operations (where possible), but this can be tracked by issue #94.