MIT-LCP / physionet

A collection of tools for working with the PhysioNet repository.
http://physionet.org/
MIT License
69 stars 17 forks source link

Matching numeric files with ICU Admission ids using time stamp from numeric files #123

Open phaniparsa opened 4 years ago

phaniparsa commented 4 years ago

Hello Team,

I'm currently working on a project on MIMIC-III matched subset and trying to link numeric record files with ICU stays from clinical aspect.

Below is the logic I'm using to combine numeric records with ICU Stays: Step#1: Extracting SUBJECT_ID and DateTime stamp from numeric files. Step#2: Joining the [SUBJECT_ID, DateTime] with ICU Stays clinical file on SUBJECT_ID. Step#3: For each row in the merged data, I'm further checking if the DateTime on file is in range of an ICUSTAY INTIME and OUTTIME. If yes, I'm retrieving those. If not, I'm not considering the row.

After completing this process, I'm ending up with 15860 files out of 22247. I'm wondering if this approach is valid or not, as nearly 7000 files are missed in this way.

Also, as per: https://archive.physionet.org/mimic2/mimic2_matching.shtml , there is a chance for some records to not find a match. So I'm a bit curious if I can validate the numeric files based on the ICUSTAYS IN and OUTTIMEs, and retrieve only those files which lie in the [INTIME, OUTTIME] interval. Could you please confirm if this approach is fine and it is acceptable to not find matches for nearly 7000 files?

alistairewj commented 4 years ago

The approach is what I would do - and I think I similarly found around ~70-75% of icustay_id could be matched. You may want to add some fuzziness to the intime/outtime windows (+- 6 hours perhaps), as those times aren't exact (sometimes patients are put on the monitoring half an hour before they are "admitted" to the ICU).

The waveform matching process was done retrospectively so it was pretty difficult to do and unfortunately wasn't perfect for all patients. I don't actually know why so many files aren't matched but at least what you have found is roughly consistent with what I've found.