eth-mds / ricu

🏥 ICU data with R 🏥
https://eth-mds.github.io/ricu/
GNU General Public License v3.0
33 stars 11 forks source link

Bug in `merge_ranges` #26

Closed prockenschaub closed 1 year ago

prockenschaub commented 1 year ago

Problem

vaso60 uses merge_ranges to look for overlaps in vasopressor times of the same ICU stay, which in turn uses data.table's foverlaps. The relevant line of code is here:

https://github.com/eth-mds/ricu/blob/9e76c074250046da93870063c3b7ed57d274b790/R/utils-ts.R#L577

I noticed that the output for dobu60 on the mimic demo data wasn't quite was I was expecting.


reclass_tbl(data.table::foverlaps(x, x, mult = "first"), as_ptype(x))
#> # A `ts_tbl`: 7 ✖ 5
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 mins)
#>   icustay_id charttime dobu_dur  i.charttime i.dobu_dur
#>        <int> <drtn>    <drtn>    <drtn>      <drtn>
#> 1     203766 1543 mins 1578 mins 1543 mins   1578 mins
#> 2     217724  590 mins  765 mins  590 mins    765 mins
#> 3     251573 1069 mins 1099 mins 1069 mins   1099 mins
#> 4     264446   22 mins  327 mins   22 mins    327 mins
#> 5     298685  482 mins  517 mins  482 mins    517 mins
#> 6     298685  482 mins  517 mins  582 mins   1548 mins
#> 7     298685  482 mins  517 mins 1527 mins   2422 mins

Not how in line 6 the ranges aren't actually overlapping. Instead, charttime and dobu_dur in lines 6 and 7 should read 582 mins and 1548 mins. This appears to be a weird interaction due to the ts_tbl class. If I downcast it to a vanilla data.table, everything works as expected.

reclass_tbl(data.table::foverlaps(as.data.table(x), as.data.table(x), mult = "first"), as_ptype(x))
#> # A `ts_tbl`: 7 ✖ 5
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 mins)
#>   icustay_id charttime dobu_dur  i.charttime i.dobu_dur
#>        <int> <drtn>    <drtn>    <drtn>      <drtn>
#> 1     203766 1543 mins 1578 mins 1543 mins   1578 mins
#> 2     217724  590 mins  765 mins  590 mins    765 mins
#> 3     251573 1069 mins 1099 mins 1069 mins   1099 mins
#> 4     264446   22 mins  327 mins   22 mins    327 mins
#> 5     298685  482 mins  517 mins  482 mins    517 mins
#> 6     298685  582 mins 1548 mins  582 mins   1548 mins
#> 7     298685  582 mins 1548 mins 1527 mins   2422 mins

Problem

Change to reclass_tbl(data.table::foverlaps(as.data.table(x), as.data.table(x), mult = "first"), as_ptype(x)) or identify the underlying root cause in ts_tbl.

nbenn commented 1 year ago

Thanks for raising this. You're right, the results are off and your fix gives us correct results. It happens due to id_tbl and data.table having slightly different semantics (default arguments that is) for unique(). If the argument passed to foverlaps() as y inherits from ts_tbl, an id_tbl object will result as uy here, as the index_var is lost along the way. Uniqueness for id_tbl by default is determined by the id_var column whereas for data.table is is for all columns, which in this case includes the pos column.

I will fix this specific issue accordingly.