eth-mds / ricu

🏥 ICU data with R 🏥
https://eth-mds.github.io/ricu/
GNU General Public License v3.0
33 stars 11 forks source link

`expand` on `win_tbl` arbitrarily sets negative end times to zero #63

Open prockenschaub opened 2 months ago

prockenschaub commented 2 months ago

When expand is called on a win_tbl, values are repeated for all time-steps that fall into the interval starting at index_var and lasting until index_var + dur_var. This works when index_var + dur_var is a positive number but whenever it's negative, it is just set to 0 by the following code.

https://github.com/eth-mds/ricu/blob/7f2cc42503e003f4aea388847232e4157b7fc8ea/R/utils-ts.R#L140-L148

The effect of this can be seen with the following example from gcs. Here, ett_gcs is processed with expand but this leads to falls results for some patients.

sed = load_concepts("ett_gcs", src = "mimic_demo")
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> • ett_gcs
#> ────────────────────────────────────────────────────────────────────────────────

sed[icustay_id == 234989]
#> # A `win_tbl`:  11 ✖ 4
#> # Id var:       `icustay_id`
#> # Index var:    `charttime` (1 hours)
#> # Duration var: `dur_var`
#>    icustay_id charttime dur_var ett_gcs
#>         <int> <drtn>    <drtn>  <lgl>
#> 1      234989 -2 hours  1 mins  TRUE       <---- negative `index_var + dur_var`
#> 2      234989  7 hours  1 mins  TRUE
#> 3      234989 14 hours  1 mins  TRUE
#> 4      234989 18 hours  1 mins  TRUE
#> 5      234989 24 hours  1 mins  TRUE
#> 6      234989 35 hours  1 mins  TRUE
#> 7      234989 39 hours  1 mins  TRUE
#> 8      234989 43 hours  1 mins  TRUE
#> 9      234989 47 hours  1 mins  TRUE
#> 10     234989 52 hours  1 mins  TRUE
#> 11     234989 55 hours  1 mins  TRUE

sed = expand(sed, aggregate = "any")
sed[icustay_id == 234989]
#> # A `ts_tbl`: 13 ✖ 3
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime ett_gcs
#>         <int> <drtn>    <lgl>
#> 1      234989 -2 hours  TRUE
#> 2      234989 -1 hours  TRUE    <--- artificially added
#> 3      234989  0 hours  TRUE    <--- artificially added
#> 4      234989  7 hours  TRUE
#> 5      234989 14 hours  TRUE
#> 6      234989 18 hours  TRUE
#> 7      234989 24 hours  TRUE
#> 8      234989 35 hours  TRUE
#> 9      234989 39 hours  TRUE
#> 10     234989 43 hours  TRUE
#> 11     234989 47 hours  TRUE
#> 12     234989 52 hours  TRUE
#> 13     234989 55 hours  TRUE

Created on 2024-04-12 with reprex v2.1.0

It is not entirey clear to me why end_var would need to be set to zero in the below code.

https://github.com/eth-mds/ricu/blob/7f2cc42503e003f4aea388847232e4157b7fc8ea/R/utils-ts.R#L146-L147

Maybe the intent was to prevent negative dur_vars? In that case, the following code would be needed instead.

x <- x[get(dura_var) < 0, c(dura_var) := as.difftime(0, units = time_unit)]
x <- x[, c(end_var) := re_time(get(start_var) + get(dura_var), interval)]