MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.41k stars 1.5k forks source link

Unique number of subject_ids does not match the number stated in documentation #1606

Open MichalWeisman opened 11 months ago

MichalWeisman commented 11 months ago

Hello,

In the dataset's documentation, it is mentioned that the data contains information for 40,000 patients. However, when computing the number of unique values in the subject_id column I receive a much greater number. For example, I would like to know the number of patients who had a blood glucose test:

glucose_df = lebevents[lebevents['itemid'] == 50931] # 50931 is the itemid of the glucose test
print(glucose_df['subject_id'].nunique())

The output is 247,005 which is much greater than 40,000. Thanks

heisenbug-1 commented 11 months ago

Hi! Correct me if I'm wrong, but I think there are 2 modules: host and icu Hosp contains information about all patients admitted to the hospital, and icu is a subset of hosp (patients that were admitted to icu, so all icu patients have a hadm_id for hospital admission and a stay_id for icu admission).

Unique number of subject ids in hosp is 180733, and icu has 50920 unique subject ids. The documentation says "over 40000 ICU patients", so this checks out :)

The lab events table includes patients that weren't necessarily admitted to the hospital/icu, and there are 255876 unique subject ids in the lab events table. If you only need glucose values for ICU patients, I'd recommend filtering by subject ids from the icu.icustays table:

SELECT DISTINCT lab.*
FROM  mimiciv_hosp.labevents as lab, mimiciv_icu.icustays as icu
WHERE lab.subject_id = icu.subject_id
AND lab.itemid = 50931

This query returns 50738 unique subject_ids. You can rewrite it for pandas like: glucose_df = labevents[(labevents['itemid'] == 50931) & (labevents['subject_id].isin(icustays_df['subject_id']))] print(glucose_df['subject_id'].nunique())

Hope this provides you with more insight into the schema :)

MichalWeisman commented 11 months ago

@heisenbug-1 Thank you very much for the clarification!