As described in #1, we have the full list of packages (= potential search results) in https://cran.r-project.org/view=Epidemiology. But this doesn't completely resolve the question of where the data to describe these packages to the LLM comes from. As far as I can tell, we have a couple of different options.
Package Description
All R packages include a description of what they are about, with potential references. This description can be highly variable in size but on average is around 3-4 sentences.
Example for linelist:
Provides tools to help storing and handling case line list data. The 'linelist' class adds a tagging system to classical 'data.frame' objects to identify key epidemiological data such as dates of symptom onset, epidemiological case definition, age, gender or disease outcome. Once tagged, these variables can be seamlessly used in downstream analyses, making data pipelines more robust and reliable.
Example for epicontacts:
A collection of tools for representing epidemiological contact data, composed of case line lists and contacts between cases. Also contains procedures for data handling, interactive graphics, and statistics.
Package vignettes
Package vignettes are a longer form of documentation that introduces concepts and usage of the package via literate programming.
The pkgdown website is a one-stop shop for R package documentation, putting in one place the package description, README, vignettes, manual, release notes, etc.
Unfortunately, not all packages have a pkgdown webste.
Optionally we could explore using the newly released tools::pkg2HTML function (R@4.4.0) to generate the documentation for the relevant packages as input.
As described in #1, we have the full list of packages (= potential search results) in https://cran.r-project.org/view=Epidemiology. But this doesn't completely resolve the question of where the data to describe these packages to the LLM comes from. As far as I can tell, we have a couple of different options.
Package Description
All R packages include a description of what they are about, with potential references. This description can be highly variable in size but on average is around 3-4 sentences.
Example for linelist:
Example for epicontacts:
Package vignettes
Package vignettes are a longer form of documentation that introduces concepts and usage of the package via literate programming.
Example for linelist:
Examples for epicontacts:
Package manual
The package manual (pdf or html (https://github.com/epiverse-connect/epiverse-search/issues/2#issuecomment-2097618994)) contains a list of functions, their goal, usage, inputs and outputs, with examples. It is also somewhat more standardized by CRAN than the previously mentioned data sources
Example for linelist: https://cran.r-project.org/web/packages/linelist/linelist.pdf
Examples for epicontacts: https://cran.r-project.org/web/packages/epicontacts/epicontacts.pdf
pkgdown website
The pkgdown website is a one-stop shop for R package documentation, putting in one place the package description, README, vignettes, manual, release notes, etc. Unfortunately, not all packages have a pkgdown webste.
Example for linelist: https://epiverse-trace.github.io/linelist/
Example for epicontacts: https://www.repidemicsconsortium.org/epicontacts/
A mix of different sources
It may also be possible to use all the available sources or a mix of them.