epiverse-trace / epiparameter

R package with library of epidemiological parameters for infectious diseases and functions and classes for working with parameters
https://epiverse-trace.github.io/epiparameter
Other
32 stars 11 forks source link

augment `epidist_db()`'s print output to orient new users #302

Closed papsti closed 3 months ago

papsti commented 4 months ago

when getting oriented to the epidist database using epidist_db(), it isn't immediately obvious which diseases, pathogens, and epi distributions are available for selection (i.e., for input into the first three arguments of epidist_db()).

currently, executing epidist_db() gives a helpful print message summarising the contents of the database like:

Returning 122 results that match the criteria (99 are parameterised).  
Use subset to filter by entry variables or single_epidist to return a single entry.
To retrieve the citation for each use the 'get_citation' function
List of <epidist> objects
  Number of entries in library: 122
  Number of studies in library: 47
  Number of diseases: 23
  Number of delay distributions: 112
  Number of offspring distributions: 10

there are two ways i can see augmenting this statement to help new users get started.

  1. additionally print labels available for input into epidist_db() for the disease, pathogen and epi_dist fields. this can potentially (repeatedly) induce too much visual clutter depending on the number of unique labels available.

  2. add a line to the print statement above pointing to the URL for a vignette online exploring this database (suggested in #296 but for a different purpose)

PaulC91 commented 4 months ago

This was also my first thought when using the package for the first time and I can see we are not alone (#277).

I've added the list of diseases to the print method in https://github.com/PaulC91/epiparameter/commit/1ed70469b0ec142ac1438ad82e7b79bdea37f5be but perhaps this makes it too long and a function to print available disease, pathogen and epi_dist options would be a better UX @joshwlambert ?

db <- epiparameter::epidist_db()
#> Returning 122 results that match the criteria (99 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function
db
#> List of <epidist> objects
#>   Number of entries in library: 122
#>   Number of studies in library: 47
#>   Number of diseases: 23
#>   Number of delay distributions: 112
#>   Number of offspring distributions: 10
#> 
#>   Diseases available:
#>   Adenovirus
#>   Chikungunya
#>   COVID-19
#>   Dengue
#>   Ebola Virus Disease
#>   Hantavirus Pulmonary Syndrome
#>   Human Coronavirus
#>   Influenza
#>   Japanese Encephalitis
#>   Marburg Virus Disease
#>   Measles
#>   MERS
#>   Mpox
#>   Parainfluenza
#>   Pneumonic Plague
#>   Rhinovirus
#>   Rift Valley Fever
#>   RSV
#>   SARS
#>   Smallpox
#>   West Nile Fever
#>   Yellow Fever
#>   Zika Virus Disease

Created on 2024-05-16 with reprex v2.0.2

jfunction commented 4 months ago

Pretty much the same thought here - I wondered how I would know to look for COVID-19 instead of COVID19 or some other name.

Maybe something with tab completion would be nice, maybe with attributes on the epidist_db object which comes back?

db <- epidist_db()
db$diseases$<tab>

and/or something like Paul recommended:

db <- epidist_db()
dis <- pull_diseases(db)
pat <- pull_pathogen(db)
# Then print these or use tab completion to repeat the call, subsetting with:
dbCov <- epidist_db(disease=dis$<tab>)

this would allow you do filter on options available within the subset you pass in which I think is rather nice.

joshwlambert commented 4 months ago

Thank you for all the suggestions! I've taken them onboard and updated the printing for the output of epidist_db().

Some specific points that have been incorporated from your comments:

Some additional details, the header states how many elements are returned by epidist_db() and the footer shows how many more are not shown by the preview and gives a hint to use print(n = ...) to show more or fewer elements in the preview and to use the parameter_tbl() function to see the results in a tabular format.

library(epiparameter)
ed = epidist_db()
#> Returning 122 results that match the criteria (99 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function

ed
#> # List of <epidist> objects 
#> # A list:  122 elements
#> 
#> Number of diseases: 23
#> ❯ Adenovirus ❯ Chikungunya ❯ COVID-19 ❯ Dengue ❯ Ebola Virus Disease ❯ Hantavirus Pulmonary Syndrome ❯ Human Coronavirus ❯ Influenza ❯ Japanese Encephalitis ❯ Marburg Virus Disease ❯ Measles ❯ MERS ❯ Mpox ❯ Parainfluenza ❯ Pneumonic Plague ❯ Rhinovirus ❯ Rift Valley Fever ❯ RSV ❯ SARS ❯ Smallpox ❯ West Nile Fever ❯ Yellow Fever ❯ Zika Virus Disease
#> 
#> 
#> Number of epi distributions: 12
#> ❯ generation time ❯ hospitalisation to death ❯ hospitalisation to discharge ❯ incubation period ❯ notification to death ❯ notification to discharge ❯ offspring distribution ❯ onset to death ❯ onset to discharge ❯ onset to hospitalisation ❯ onset to ventilation ❯ serial interval
#> 
#> 
#> [[1]]
#> Disease: Adenovirus
#> Pathogen: Adenovirus
#> Epi Distribution: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-6
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-6>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 1.247
#>   sdlog: 0.975
#> 
#> [[2]]
#> Disease: Human Coronavirus
#> Pathogen: Human_Cov
#> Epi Distribution: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-7
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-7>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 0.742
#>   sdlog: 0.918
#> 
#> [[3]]
#> Disease: SARS
#> Pathogen: SARS-Cov-1
#> Epi Distribution: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-8
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-8>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 0.660
#>   sdlog: 1.205
#> 
#> # ℹ 119 more elements
#> # ℹ Use `print(n = ...)` to see more elements.
#> # ℹ Use `parameter_tbl()` to see a summary table of the parameters.
#> Explore database online at
#> <https://epiverse-trace.github.io/epiparameter/dev/articles/database.html>

Created on 2024-06-03 with reprex v2.1.0

This new functionality is implemented in PR #326, please feel free to provide feedback and I will leave this PR open until the end of the week.

For those suggestions that have not been addressed by this PR, I will tackle separately.

joshwlambert commented 3 months ago

Moving this issue to Done in the v0.2.0 project as I don't plan to make any more changes with respect to these discussions before the next release.

I'm leaving the issue open as although the bulk of the issue is addressed by PR #326, there are some good points raised about enabling autocomplete to list diseases or pull the diseases and pathogens, which can be implemented in an upcoming version.