MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.58k stars 1.52k forks source link

multiple stay_id mapped to the same hadm_id in edstays #1338

Closed VoyagerWSH closed 2 years ago

VoyagerWSH commented 2 years ago

Prerequisites

Description

In the edstays table of MIMIC-IV-ED_v2.0, we find that, for a given patient, multiple unique stay_id map to the same hadm_id. Precisely, there are 600 unique hadm_id that had more than one unique stay_id associated with it.

As an example, hadm_id 21436543 has the following three stay_id linked to it: 30523998, 32093485, and 33998754. We have inspected hadm_id 21436543 in transfers.csv.gz, and it shows that this patient has transferred three times in ED before being admitted, creating three unique transfer_id that are exactly the same as the three stay_id. Here are some code that one can use to reproduce this result:

# read in the data tables of edstays and transfer
ed_stays = pd.read_csv(mimic_iv_ed_v2_path + 'edstays.csv.gz')
transfer = pd.read_csv(mimic_iv_v2_path+ 'transfers.csv.gz')

# show the data frame of three stay_ids associated with the same hadm_id
ed_stays[ed_stays['hadm_id'] == 21436543.0] 

# show the transfer records associated with this hadm_id
transfer[transfer['hadm_id'] == 21436543.0].sort_values('intime')

We have one idea for what might be going on. The documentation of the transfer table states that “the stay_id present in the icustays and edstays tables is derived from transfer_id. For example, three contiguous ICU stays will have three separate transfer_id for each distinct physical location (e.g. a patient could move from one bed to another). The entire stay will have a single stay_id, which will be equal to the transfer_id of the first physical location.”

Two questions: Should the stay_id for these examples be updated to be the very first transfer_id? More broadly is our intuition --- that a hadm_id should be associated with at most one stay_id, which can then correspond to multiple transfer_ids --- correct?

Thanks!

alistairewj commented 2 years ago

There are definitely a few inconsistencies in the ED stay data which are just due to the nature of the environment. It's hard to come up with a fixed reason as it varies: sometimes patients are "admitted" to the ED for observation, sometimes they leave the ED and come back, sometimes there are two registrations, etc etc. Never found a consistent reason for multiple ED stays corresponding to the same hospitalization (if we did, we would have made it simpler!).

Two questions: Should the stay_id for these examples be updated to be the very first transfer_id?

It would be possible but I'm not sure you would fix more problems than you introduce. We did experiment with this before releasing the data. It's much much more straightforward with ICU patients, because (1) they move units far less frequently, and (2) their transfers are often simply bed changes rather than room changes.

More broadly is our intuition --- that a hadm_id should be associated with at most one stay_id, which can then correspond to multiple transfer_ids --- correct?

Yes, that's correct. stay_id will only correspond to multiple transfer_id for ICU stays. I can actually give you the logic that was used in the icustays table:

...
    -- create a flag indicating if this is a new stay
    -- flag == 1 if it's the first transfer into an ICU
    , CASE
        -- if this row follows an ICU stay, *and* it's also an icu stay
        -- then set it to 0, so we do not flag it as a new ICU stay
        -- ultimately, we will group this row with the prior rows into a single stay
        WHEN s.is_icu = 1 AND LAG(s.is_icu) OVER fn = 1 THEN 0
        -- otherwise, set the flag to 1 if it's an icu (is_icu),
        -- *or* if the row is directly following an ICU stay (LAG()) - this ends the prior ICU stay
        ELSE s.is_icu + COALESCE(LAG(s.is_icu) OVER fn, 0)
    END AS icu_flag

...

, SUM(icu_flag) OVER (PARTITION BY hadm_id ORDER BY intime) AS icu_partition

...

  , FIRST_VALUE(transfer_id) OVER w AS stay_id
...
WINDOW w AS (
  PARTITION BY hadm_id, icu_partition
  ORDER BY intime
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

We don't use this logic in the edstays table, so there's no grouping of stay_id there.

VoyagerWSH commented 2 years ago

I see. We might just drop all the hadm_id that are mapped to multiple stay_id for our analysis. Thanks for your explanation!