MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.43k stars 1.5k forks source link

Zeroes in "valuenum" #1540

Open marinaperezs opened 1 year ago

marinaperezs commented 1 year ago

Hi everyone!

I'm working with the MIMIC-IV database and I need to do some missing data handling. To do so, I'm checking if a 0 value means something in the database or I can put a 0 to identify the missing values for my neural network. I found out that in some items there is a valuenum=0, for instance, in heart rate and respiratory rate. Does this mean that the patient died? So this 0 values must be the last values registered for the patient? If so, I'll need to find another imputation method, because putting a 0 already means something in the database.

I don't know if I made myself clear, but thank you in advance!

alistairewj commented 1 year ago

It could be that, or it could be that they took the leads off the patient, and the heart rate of the bed is 0. :)

Usually, I've found that a heart rate of 0 is bad data. Same for respiratory rate. But, you'll find itemids for spontaneous respiratory rate where the value is 0. In that case, the patient is breathing, but supported by the ventilator. My point being that interpreting a value of 0 is dependent on the itemid (i.e. the context of the measure).

There are some issues on thresholding data to remove implausible values that you might want to take a look at as well.

marinaperezs commented 1 year ago

What do you mean there are some issues on thresholding data ?? And yeah, so if it depends on the itemid, do you recommend any other imputation method ? Or any other way to deal with missing data?

alistairewj commented 1 year ago

Oh I meant there are other GitHub issues where people discuss different thresholds they use to decide "good" and "bad" data. Though now that I've said that I can't seem to find them with a cursory search D:

I don't have any particular recommendation for missing value imputation. I usually try to impute from another source in the dataset if I can (another itemid, or carry forward if appropriate). If that fails and I'm doing ML it's either mean value imputation for linear models or having the model to learn the missing data contribution (xgboost allows for this).