eth-mds / ricu

🏥 ICU data with R 🏥
https://eth-mds.github.io/ricu/
GNU General Public License v3.0
36 stars 10 forks source link

ricu

Lifecycle CRAN R build
status pkgdown build
status covr
status Codecov test
coverage

Working with ICU datasets, especially with publicly available ones as provided by PhysioNet in R is facilitated by ricu, which provides data access, a level of abstraction to encode clinical concepts in a data source agnostic way, as well as classes and utilities for working with the arising types of time series datasets.

Citation

To cite ricu, please use the following:

@article{bennett2023ricu,
  title={ricu: R’s interface to intensive care data},
  author={Bennett, Nicolas and Ple{\v{c}}ko, Drago and Ukor, Ida-Fong and Meinshausen, Nicolai and B{\"u}hlmann, Peter},
  journal={GigaScience},
  volume={12},
  pages={giad041},
  year={2023},
  publisher={Oxford University Press}
}

Installation

Currently, installation is only possible from github directly, using the remotes if installed

remotes::install_github("eth-mds/ricu")

or by sourcing the required code for installation from github by running

rem <- source(
  paste0("https://raw.githubusercontent.com/r-lib/remotes/main/",
         "install-github.R")
)
rem$value("eth-mds/ricu")

In order to make sure that some useful utility packages are installed as well, consider installing the packages marked as Suggests as well by running

remotes::install_github("eth-mds/ricu", dependencies = TRUE)

instead, or by installing some of the utility packages (relevant for downloading and preprocessing PhysioNet datasets)

install.packages("xml2")

and demo dataset packages

install.packages(c("mimic.demo", "eicu.demo"),
                 repos = "https://eth-mds.github.io/physionet-demo")

explicitly.

Data access

Out of the box (provided the two data packages mimic.demo and eicu.demo are available), ricu provides access to the demo datasets corresponding to the PhysioNet Clinical Databases eICU and MIMIC-III. Tables are available as

mimic_demo$admissions
#> # <mimic_tbl>: [129 ✖ 19]
#> # ID options:  subject_id (patient) < hadm_id (hadm) < icustay_id (icustay)
#> # Defaults:    `admission_type` (val)
#> # Time vars:   `admittime`, `dischtime`, `deathtime`, `edregtime`, `edouttime`
#>     row_id subject_id hadm_id admittime           dischtime
#>      <int>      <int>   <int> <dttm>              <dttm>
#> 1    12258      10006  142345 2164-10-23 21:09:00 2164-11-01 17:15:00
#> 2    12263      10011  105331 2126-08-14 22:32:00 2126-08-28 18:59:00
#> 3    12265      10013  165520 2125-10-04 23:36:00 2125-10-07 15:13:00
#> 4    12269      10017  199207 2149-05-26 17:19:00 2149-06-03 18:42:00
#> 5    12270      10019  177759 2163-05-14 20:43:00 2163-05-15 12:00:00
#> 
#> 125  41055      44083  198330 2112-05-28 15:45:00 2112-06-07 16:50:00
#> 126  41070      44154  174245 2178-05-14 20:29:00 2178-05-15 09:45:00
#> 127  41087      44212  163189 2123-11-24 14:14:00 2123-12-30 14:31:00
#> 128  41090      44222  192189 2180-07-19 06:55:00 2180-07-20 13:00:00
#> 129  41092      44228  103379 2170-12-15 03:14:00 2170-12-24 18:00:00
#> # ℹ 124 more rows
#> # ℹ 14 more variables: deathtime <dttm>, admission_type <chr>,
#> #   admission_location <chr>, discharge_location <chr>, insurance <chr>,
#> #   language <chr>, religion <chr>, marital_status <chr>, ethnicity <chr>,
#> #   edregtime <dttm>, edouttime <dttm>, diagnosis <chr>,
#> #   hospital_expire_flag <int>, has_chartevents_data <int>

and data can be loaded into an R session for example using

load_ts("labevents", "mimic_demo", itemid == 50862L,
        cols = c("valuenum", "valueuom"))
#> # A `ts_tbl`: 299 ✖ 4
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 hours)
#>     icustay_id charttime valuenum valueuom
#>          <int> <drtn>       <dbl> <chr>
#> 1       201006   0 hours      2.4 g/dL
#> 2       203766 -18 hours      2   g/dL
#> 3       203766   4 hours      1.7 g/dL
#> 4       204132   7 hours      3.6 g/dL
#> 5       204201   9 hours      2.3 g/dL
#> 
#> 295     298685 130 hours      1.9 g/dL
#> 296     298685 154 hours      2   g/dL
#> 297     298685 203 hours      2   g/dL
#> 298     298685 272 hours      2.2 g/dL
#> 299     298685 299 hours      2.5 g/dL
#> # ℹ 294 more rows

which returns time series data as ts_tbl object.

Acknowledgments

This work was supported by grant #2017-110 of the Strategic Focal Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain for the SPHN/PHRT Driver Project “Personalized Swiss Sepsis Study”.