eth-mds / ricu

🏥 ICU data with R 🏥
https://eth-mds.github.io/ricu/
GNU General Public License v3.0
33 stars 11 forks source link

Rounding of time variables #25

Closed prockenschaub closed 6 months ago

prockenschaub commented 1 year ago

When trying to calculate some of the event times by hand, I noticed that my results were sometimes ever so slightly off. For example, instead of -52 minutes I would get -51 minutes. I did some digging and think it is due to an inconsistent use of rounding when changing time precision.

round is for example used when calculating minutes in in id_win_helper for MIMIC.

https://github.com/eth-mds/ricu/blob/9e76c074250046da93870063c3b7ed57d274b790/R/data-utils.R#L347

On the other hand, trunc is used when loading MIMIC or when re_timeing.

https://github.com/eth-mds/ricu/blob/9e76c074250046da93870063c3b7ed57d274b790/R/utils-misc.R#L78-L80

I think all time rounding in ricu should use the same rounding rule. I would further argue that neither round nor trunc are the optimal choice. Both group events that happened before the nominal time into this bucket. For example, round will put both -0.5 hours and 0.5 hours into bucket 0. However, if the reference date is ICU admission, this pools pre-admission and post-admission info. Of course, this often may not matter because of imprecision in recording but nevertheless. trunc is even worse, as it would put both -0.99 hours and 0.99 hours into bucket 0, effectively creating a two-hour bucket.

I would instead opt for the use of floor throughout ricu. This will never include pre-admission information in a non-negative bucket number and keeps all buckets at the same duration.

nbenn commented 1 year ago

@dplecko, do you have any thoughts on this? I remember, at some point we had some discussions on how to do this.

  1. We should absolutely do the same everywhere.
  2. Both round() and trunc() are symmetric around the origin. At some point, I believe, we thought this was a good thing. But looking at it from the point of changing points of reference (i.e. moving from ICU stay IDs to Hospital admission IDs), this does not feel like a good choice.

(An illustration)

dplecko commented 1 year ago

yes, I agree that floor is the best choice to use throughout.

dplecko commented 6 months ago

Thanks @prockenschaub; this is indeed a good point. round() and trunc() have been replaced by floor() throughout, in the new version ricu 0.6.0.