MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.6k stars 1.52k forks source link

paused vasopressor durations #1808

Open amaletzk opened 1 month ago

amaletzk commented 1 month ago

Prerequisites

Description

Hi,

in lines 264 to 271 in https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/concepts/durations/vasopressor_durations.sql, vasopressor administrations from inputevents_mv are grouped based on their linkorderid, and the minimum and maximum starttime and endtime, respectively, are calculated for each group. This makes sense if the starttime of an entry equals the endtime of the previous entry, but not if there is a gap between. According to https://github.com/MIT-LCP/mimic-code/issues/47, this should never be the case after removing "Rewritten" entries, but I found at least one counterexample, namely linkorderid = 8781494:

select
    icustay_id, linkorderid, starttime, endtime, statusdescription, itemid
  FROM inputevents_mv
  where linkorderid = 8781494
  order by starttime

In the result, there is a ~6h gap between 2167-08-05 21:16:00 and 2167-08-06 04:01:00. To my understanding, such gaps should not be included in the final vasopressor durations.

Note: The status of the entry before the gap is "Paused". I did not investigate whether "Paused" entries always lead to gaps, but can do that if it helps.

alistairewj commented 1 month ago

Interesting - yes I don't think the code handles paused well. Maybe if you look at a histogram of durations for the rows following a paused row? If the median is somewhere above 2 hours then that hints that the code could be improved by incorporating paused.

amaletzk commented 1 month ago

Thanks for your reply! I did a bit of research and found that there is indeed almost always a gap between a paused row and its successor. The median is 1 hour 11 minutes; detailed statistics below:

count                        30168
mean     0 days 02:10:04.685759745
std      0 days 03:02:01.875933766
min                0 days 00:00:00
25%                0 days 00:30:00
50%                0 days 01:11:00
75%                0 days 02:45:00
max                3 days 10:26:00

Comparing this to non-paused rows reveals a significant difference; there are almost never gaps, and if, they are short:

count                         351286
mean     -1 days +23:59:59.448483572
std        0 days 00:02:07.398201506
min                -1 days +11:45:00
25%                  0 days 00:00:00
50%                  0 days 00:00:00
75%                  0 days 00:00:00
max                  0 days 00:18:00

Notes:

My suggestion would be to simply remove lines 264 to 271, i.e., don't group by linkorderid. I don't know if the treatment of inputevents_cv must be changed accordingly, though.