MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.56k stars 1.51k forks source link

problem in obtaining waveforms of a specific ICD code #1569

Closed mahdiehhhh closed 1 year ago

mahdiehhhh commented 1 year ago

Prerequisites

Description

Hello I have encountered a problem regarding the connection between waveforms and the information on patient's ICD codes.

Using ICD 9 code 53501, for example, I am going to extract it's waveforms. In order to accomplish this, I used the DIAGNOSES_ICD table and obtained the corresponding subject_id and HADM_ID. I know that I should then get the corresponding waveform with the help of the intime column from the ICUSTAYS table, but I have a problem because there are multiple ICD codes for each subject, and I must have the ICUSTAY_ID column in the DIAGNOSES_ICD dataframe in order to get the waveform for each code.

In other words, I must follow these steps, but I do not have the ICUSTAY_ID. 1- Finding the ICUSTAY_ID associated with the desired ICD code 2- Finding the waveform based on the ICUSTAY_ID obtained in step

I'm hoping you see the issue. Is there a solution to this problem?

Inked2

mahdiehhhh commented 1 year ago

@alistairewj

alistairewj commented 1 year ago

There are potentially:

Technically, this can be accomplished by aggregating in each step.

This would end up giving you a query something like this:

SELECT i.icustay_id, i.hadm_id, MAX(CASE WHEN icd_code = '53501' THEN 1 ELSE 0 END) AS has_icd_code
FROM mimiciii.icustays i
INNER JOIN mimiciii.admissions a
ON i.hadm_id = a.hadm_id
INNER JOIN mimiciii.diagnoses_icd dx
ON a.hadm_id = dx.hadm_id
GROUP BY i.icustay_id, i.hadm_id

You'd then need to align the waveforms based on the intime, outtime, and time of the matched waveform.

However, this is making some assumptions. It's assuming that the billed ICD code for the hospitalization is valid for each ICU stay. This may be OK in some cases, but you can very easily imagine situations where an ICD code is not valid for an entire ICU stay. For example, the code for acute kidney injury is 584.9. This means that the patient had acute kidney injury sometime during their hospital stay, but you don't know when. Since the waveform occurs only for a subset of their hospital stay, it very well may be that you extract a waveform "associated" with the ICD code for 584.9, but that doesn't have anything related to acute kidney injury. This is a fundamental limitation of hospital based ICD codes: there's no time component for them.

Hope that helps!

mahdiehhhh commented 1 year ago

Thank you so much for taking the time to give me advice. Definitely, I will read what you said carefully and act accordingly regards

‫‪Alistair Johnson‬‏ @.***‬‏> در تاریخ یکشنبه ۲۵ ژوئن ۲۰۲۳ ساعت ۲۲:۰۶ نوشت:‬

There are potentially:

  • many icd_code per hadm_id
  • many icustay_id per hadm_id
  • many waveforms for each icustay_id

Technically, this can be accomplished by aggregating in each step.

  • for icd_code to hadm_id, you could check for any presence of the icd code (very often done with MAX(CASE WHEN icd_code = '53501' THEN 1 ELSE 0 END) AS has_icd_code)
  • for icustay_id to hadm_id, see below, but you could simply assign all icustay_id within an hadm_id as having the code
  • for waveforms, you would then do the same

This would end up giving you a query something like this:

SELECT i.icustay_id, i.hadm_id, MAX(CASE WHEN icd_code = '53501' THEN 1 ELSE 0 END) AS has_icd_code FROM mimiciii.icustays i INNER JOIN mimiciii.admissions a ON i.hadm_id = a.hadm_id INNER JOIN mimiciii.diagnoses_icd dx ON a.hadm_id = dx.hadm_id GROUP BY i.icustay_id, i.hadm_id

You'd then need to align the waveforms based on the intime, outtime, and time of the matched waveform.

However, this is making some assumptions. It's assuming that the billed ICD code for the hospitalization is valid for each ICU stay. This may be OK in some cases, but you can very easily imagine situations where an ICD code is not valid for an entire ICU stay. For example, the code for acute kidney injury is 584.9. This means that the patient had acute kidney injury sometime during their hospital stay, but you don't know when. Since the waveform occurs only for a subset of their hospital stay, it very well may be that you extract a waveform "associated" with the ICD code for 584.9, but that doesn't have anything related to acute kidney injury. This is a fundamental limitation of hospital based ICD codes: there's no time component for them.

Hope that helps!

— Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/mimic-code/issues/1569#issuecomment-1606261381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVJ3QDUXNSYPX32TFINWEQ3XNCR5RANCNFSM6AAAAAAZGEWMVI . You are receiving this because you authored the thread.Message ID: @.***>

mahdiehhhh commented 1 year ago

Hello again @alistairewj I tried to carefully read the steps you mentioned and get their tips, although I almost did not understand the whole code that you wrote in query language. As far as I understand, you used "icustays" , "admissions" , "diagnoses_icd" tables , but in none of the tables, there are two columns "_icustayid" and "_hadmid" next to each other so that I can match them.