Open pszolovits opened 2 months ago
Even Paul Bunyan is not 68,000 feet tall!
😂
There's also the patient with a blood pressure higher than the atmospheres at the Mariana Trench. Forgot to take their lisinopril I assume :)
Since unit conversions are easy, I wonder if it might be OK to drop either the inch or cm measurements, since they are almost totally equivalent. I don't know if BIDMC records the original data in one or the other unit, but that might be the right one to keep. Perhaps the recording convention changed during the data collection period.
Probably, I didn't realize but Metavision stores a computed value for the other. I think dropping the computed value makes sense. I think I've done this once and forgotten, but I meant to look for heaping in the data to determine which one was the "original" measurement (e.g. if inches heap and cm have decimals, then the original unit is inches). Still, I'm pretty sure the unit of documentation is inches.
Issues with weights from
inputevents
Presumably, the individual outliers with huge recorded weight changes during a stay are artifacts. Surprisingly, however, looking just at the more plausible end of this spectrum, we see many weight spans (max minus min weight) of up to 100 kg (or more), most of which are still almost certainly artifacts. I am not sure what to make of this. For now, I am leaving these data as recorded except for having eliminate the low and high-weight outliers. There is so much data that the bad data may be washed out.
Interesting, never noticed that. Certainly a swing of 10kg is very reasonable due to fluid balance changes, but 100kg seems.. less reasonable?
Correlations among average weights per stay
If we look at the mean of weights recorded during each ICU stay, they are highly correlated except for "Feeding Weight". The daily weights are "only" at ~0.95, whereas the inputs and admission weights are nearly at 1.00.
Ah I had always wondered if daily weight was populated from admit or daily, and I guess you've found it's from admit. All this does seem to justify the choice to only include admit/daily weight in the weight durations query, which tries to assign start/stop times of weight to patients throughout their ICU stay: https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iv/concepts/demographics/weight_durations.sql
In pulling data about ICU patients in MIMIC-IV v2.2, I have noticed some data errors and peculiarities involving height and weight data.
Heights
One extreme height
Even Paul Bunyan is not 68,000 feet tall!
Most heights are duplicated between
itemid
s 226707 (inches) and 226730 (cm).For 33,700 of the 33,707 height measurements recorded in
chartevents
each pair of inch and cm entries have identical metadata (subject_id
,hadm_id
,stay_id
,charttime
) and all but 69 of those values fall within 1 cm of each other when we convert inches to cm (i.e., inch * 2.54). None (except the extreme value noted above) differ by more than 2 cm.For the 7 pairs of height measurements where the metadata don't match, the difference is only in
stay_id
, which appear to be assigned differently between the inch and cm data. Thestoretime
s also differ, though I did not check if they ever differ for other cases. The inch and cm values are compatible even for these cases.Since unit conversions are easy, I wonder if it might be OK to drop either the inch or cm measurements, since they are almost totally equivalent. I don't know if BIDMC records the original data in one or the other unit, but that might be the right one to keep. Perhaps the recording convention changed during the data collection period.
Weights
Many different sources of weights
itemid
s, four of them as part ofchartevents
and one as part ofinputevents
. In fact, because every input event has an associated weight, this source provides the vast majority of data:inputevents
'patientweight
, de-duplicatedExtreme weight values
Although there are extreme variations in human weight, some of the recorded weights are highly likely to be artifacts. For example, each source of weight data contains maximum weights over 5,000 kg and minimum weights of 1 kg or less (or negative!).
inputevents
It seems reasonable just to drop such extreme values of weight.
Issues with weights from
inputevents
A very large number of weights derived from
inputevents
(8,978,893) are duplicated (same person, same time) probably because multiple inputs were ordered simultaneously, each recorded with a weight. Eliminating such duplicates leave 5,159,302 weights for 72,690 ICU stays. A significant number (2,836) are recorded as implausibly high, namely > 300 kg. There are a very few patients with such weights (> 660 lbs), but I suspect that nearly all such recordings are artifacts.When I look at the consistency of these weights, most are consistent within an ICU stay, but some are highly inconsistent. Of these 72,690 stays, in 69,171, the minimum and maximum weights from
inputevents
are identical. However, for 3,088 stays, there are weight differences > 1kg, where some are quite large:Presumably, the individual outliers with huge recorded weight changes during a stay are artifacts. Surprisingly, however, looking just at the more plausible end of this spectrum, we see many weight spans (max minus min weight) of up to 100 kg (or more), most of which are still almost certainly artifacts. I am not sure what to make of this. For now, I am leaving these data as recorded except for having eliminate the low and high-weight outliers. There is so much data that the bad data may be washed out.
Correlations among average weights per stay
Or, visually: