Make it possible to operate on columns: `epiparam` -> `data.frame` downgrading

jamesmbaazam commented 1 year ago

I imagine that users would sometimes want to manipulate epiparam objects on the columns, so consider allowing for the downgrading of the epiparam class.

Additional context

Here is an example where I was trying to find the unique diseases in the database, but got an error.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
a <- epiparameter::epiparam()
a %>% dplyr::distinct(disease)
#> Error in validate_epiparam(NextMethod()): epiparam object does not contain the correct columns

^{Created on 2023-03-01 with reprex v2.0.2}

avallecam commented 1 year ago

Since the epiparam object is of class data.frame in the background, we can use dplyr::as_tibble() before using other dplyr functions:

library(epiparameter)
library(tidyverse)
a <- epiparameter::epiparam()
class(a)
#> [1] "epiparam"   "data.frame"
a %>% 
  # warning: this step breaks the epiparam class object
  as_tibble() %>%  
  dplyr::distinct(disease)
#> # A tibble: 24 × 1
#>    disease                      
#>    <chr>                        
#>  1 Adenovirus                   
#>  2 Chikungunya                  
#>  3 COVID-19                     
#>  4 Dengue                       
#>  5 Ebola Virus Disease          
#>  6 Hantavirus Pulmonary Syndrome
#>  7 Human Coronavirus            
#>  8 Influenza                    
#>  9 Japanese Encephalitis        
#> 10 Marburg Virus Disease        
#> # … with 14 more rows

^{Created on 2023-03-11 with reprex v2.0.2}

avallecam commented 1 year ago

Although, in #111 I used dplyr::filter() directly after the epiparam() output and it worked nicely.

After using dplyr::select() it says that is needed to keep all the columns of the original epiparam class object. For that reason, using dplyr::filter() works fine. I think that having this keeps the communication with epidist() secure.

library(epiparameter)
library(tidyverse)

eparams <- epiparam()

eparams %>% 
  filter(disease=="Influenza")
#> Epiparam object
#> Number of distributions in library: 17
#> Number of diseases: 1
#> Number of delay distributions: 17
#> Number of offspring distributions: 0
#> Number of studies in library: 10
#> <Head of library>
#>     disease  epi_distribution prob_distribution
#> 1 Influenza   generation_time           weibull
#> 2 Influenza incubation_period             gamma
#> 3 Influenza incubation_period             lnorm
#> 4 Influenza incubation_period             lnorm
#> 5 Influenza incubation_period             lnorm
#> 6 Influenza incubation_period             lnorm
#> <11 more rows & 53 more cols not shown>

eparams %>% 
  select(disease)
#> Error in validate_epiparam(NextMethod()): epiparam object does not contain the correct columns

^{Created on 2023-03-11 with reprex v2.0.2}

As user, the take-home message for me is to break the epiparam class object with dplyr::as_tibble() to explore the data as freely as I need. After identifying my specific set of filters, then apply them directly to epiparam() for further connection with epidist()

avallecam commented 1 year ago

Also related, I just encountered the Editorial decisions of the Epi R Handbook.

We can discuss if these decisions can also apply to package documentation, and be registered in the blueprints also in table format as a summary.

For this issue, to specifically generate intermediate outputs tidyverse-friendly or use it in the documentation as visible alternatives for tidyverse users. Related, I also proposed this for {finalsize} https://github.com/epiverse-trace/finalsize/issues/138 and could be applied across packages.