MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.41k stars 1.5k forks source link

Inconsistent subject/hadm IDs in MIMIC-IV #1630

Closed amaletzk closed 10 months ago

amaletzk commented 10 months ago

Prerequisites

Description

I think some of the subject/hadm IDs are inconsistent between tables "admissions" and "labevents" (and maybe others, but I've only checked these two so far).

Specifically, this concerns the following (exhaustive, as far as I can tell) list of hadm_ids:

hadm_id subject_id_lab subject_id_adm
23722350 18338095 15814863
26518091 14800009 15623678
28786366 14800009 15623678
26967600 19518697 19724547
27888708 12592838 15677328
26016638 13529335 14170556
27145598 13529335 14170556
26959192 14233350 10750164
29781340 14143633 14995832
28313923 15100573 10564147
21469456 15656675 15958656
22653302 10628682 11147539
25055415 13135593 18614569
26799855 16269697 10821441
27302667 16786482 13992569
21289895 19416727 17159870
26049594 11665270 13409745
27570734 18383499 14361933
25502517 14691183 10426612
27777792 14691183 10426612
20215264 18189742 14957669
24106521 10901700 10977178
29230978 11057055 10744839
27740734 14125406 18235677
29522271 19776395 12848429
28570390 12322361 13212171
26387145 14971926 16294879

subject_id_lab refers to the subject_id in "labevents", subject_id_adm to the subject_id in "admissions". Moreover, the two hadm_ids 27570734 and 24106521 are each associated with at least two distinct subject_ids in "labevents".

I've checked this with MIMIC-IV 1.0. The changelog does not mention that the issue has been fixed in a more recent version of MIMIC-IV.

heisenbug-1 commented 10 months ago

Hi! I just checked the hadm_id 27570734 and 24106521 in the labevents and admissions in MIMIC-IV v2.2, and each of them has a unique subject_id. Moreover, the hadm_id 7034 had a subject_id that ends with 933(not on your list), and hadm_id 6521 has a subject_id that ends with 178(also not on your list).

Also, the rest of the hadm ids from your table are not in the labevents in MIMIC-IV v2.2, but all of them are in admissions, also with unique subject_ids, i.e no two unique subject ids share the same hadm_id.

Are you using a join in your query or how did you get this result, if I may ask?

amaletzk commented 10 months ago

Hi! Thanks for your quick response.

I grouped the entries in labevents by hadm_id and selected the minimum and maximum subject_id per group. Then I joined the resulting table with admissions and compared the subject_ids thus obtained.

Actually, the changelog of v2.0 mentions

The mechanism for determining patients included in MIMIC was changed. For the most part this has resulted in an improvement, particularly regarding the logic for merging patients who had distinct medical record numbers.

which could be the reason for the differences between v1.0 and v2.2. I'll check it on more recent versions - maybe the issue has been addressed in the meantime :)

amaletzk commented 10 months ago

Yep, the problem disappeared in v2.0. I'll therefore close this issue.