MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.59k stars 1.52k forks source link

time interval used in kdigo-uo.sql #476

Open JackieMium opened 6 years ago

JackieMium commented 6 years ago

To calculate urine ouput over 3 periods, that is:

  -- three sums:
  -- 1) over a 6 hour period
  -- 2) over a 12 hour period
  -- 3) over a 24 hour period

The code in mimic-code/concepts/organfailure/kdigo-uo.sql is:

  , sum(case when iosum.charttime <= io.charttime + interval '5' hour
      then iosum.VALUE
    else null end) as UrineOutput_6hr
  , sum(case when iosum.charttime <= io.charttime + interval '11' hour
      then iosum.VALUE
    else null end) as UrineOutput_12hr
  , sum(iosum.VALUE) as UrineOutput_24hr
....
and iosum.charttime <= (io.charttime + interval '23' hour)
....

The 3 intervals used were 5/11/23 hrs. And later in kdigo-stages-48hr.sql and kdigo-stages-7days.sql:

..
min(uo.urineoutput_6hr / uo.weight / 6.0)::numeric as uo6
...
, min(uo.urineoutput_12hr / uo.weight / 12.0)::numeric as uo12
...
, min(uo.urineoutput_24hr / uo.weight / 24.0)::numeric as uo24
...

so, shouldn't the above intervals be 6/12/24 hrs?

And in the comments from kdigo-stages-48hr.sql and kdigo-stages-7days.sql:

-- For urine output: the highest UO in hours 0-48 is used

Aren't we using the lowest UO as seen from the code above min()?

Correct me if I am missing something here, thanks.

alistairewj commented 6 years ago

Sorry for the late reply. The kdigo-uo query is making the assumption that data measured at charttime = 03:00 corresponds to urine output during the last hour, i.e. hours 02:00-03:00. So if we take 23 instances where charttime = 03:00, charttime = 04:00, and so on, we should get UO for 24 hours. I will concede that this will not be perfect, e.g. they may document 2 hours of UO at the first charttime and so you will overestimate the UO during the time period. However I think the majority of documentation is hourly - and the information of start/end time for UO isn't known/documented (just "end" time == charttime). One possible approach is to assume that the previous charttime is a start time for the current UO measure - worth exploration.

The 2nd question, if you don't mind me restating, is whether for a 6 hour period we should calculate the lowest UO and use that as the value to check for AKI, or calculate the highest UO and use that as the value as the value to check for AKI. The comment and the code are definitely inconsistent, but I see logic in both approaches. I haven't reviewed the KDIGO guidelines in a while but perhaps they shed some light? I'll keep this in mind when I find time to the review the other KDIGO issue (#484), but in the mean time if you have any insight that would be welcome!

JackieMium commented 6 years ago

Thanks for your explanation! I now get it why the time interval is like that.

But I've read the guildline and didn't find anything helpful for the 2n question. Maybe I am still missing something here, but I think we should change the the script to avoid misunderstanding. Lowest UO during a 6hr period seems to be more reasonable. Or maybe it's better we clearify this in the comment and keep it open.

JackieMium commented 5 years ago

@alistairewj I looked again into the kdigo_stages_7day and kdigo_stages_48hr data today, it occurred to me that the code to sum urine outputs over time intervals and diagnose AKI according to KDIGO might be flawed.

The code used :

  , sum(case when iosum.charttime <= io.charttime + interval '5' hour
      then iosum.VALUE
    else null end) as UrineOutput_6hr
  , sum(case when iosum.charttime <= io.charttime + interval '11' hour
      then iosum.VALUE
    else null end) as UrineOutput_12hr
  , sum(iosum.VALUE) as UrineOutput_24hr
....
and iosum.charttime <= (io.charttime + interval '23' hour)
....

to first get sums over time intervals and then divide sums by the time intervals and body weight:

..
min(uo.urineoutput_6hr / uo.weight / 6.0)::numeric as uo6
...
, min(uo.urineoutput_12hr / uo.weight / 12.0)::numeric as uo12
...
, min(uo.urineoutput_24hr / uo.weight / 24.0)::numeric as uo24
...

However, it ASSUMES that urine outputs for patients are hourly recorded and there are no abnormal gaps. When I looked at the urineoutput data and calculate sums for every 6hr as the code above, here icustay_id = 200003 as an example:

Screenshot-2019-05-16-10-19-16

I saw there are many records not precisely at expected hourly point, resulting in wrongly calcuted sums. The 3 yellow cells are supposed to be minimum 6hr urine output during the first 48hr, while unfortunately the third 235 was wrongly calculated since it's actually the sum of 5hr from 2019-08-04 07:00:00 to 2019-08-04 12:00:00. For this patient it doesn't matter because there are 3 235, but from this we might speculate that there are many wrongly calculated sums from the above code due to time gaps that are not exactly 1hr.

I then set out to find some papers. AKI incidence has been reported to be 18.3% in MIMIC II data [1], 21.1% in MIMIC III data[2] both based only on serum creatinine levels accroding to KDIGO (the two papers came from the same team I believe), and 31% in MIMIC II data based on serum creatinine levels according to AKIN in another paper[3] However, when I construct a cohort as:

SELECT ic.icustay_id
, ic.gender
, ic.admission_age AS age
, w.weight
, h.height
, los_icu
, first_hosp_stay
, aki_48hr
, aki_7day
FROM icustay_detail ic
LEFT JOIN weightfirstday w USING(icustay_id)
LEFT JOIN heightfirstday h USING(icustay_id)
LEFT JOIN kdigo_stages_48hr k4 USING(icustay_id)
LEFT JOIN kdigo_stages_7day k7 USING(icustay_id);

and then select records with age 18~89, first hospital stay, los > 24h and with valid height and weight data, I got a cohort of 20404 cases with AKI incidences of 45% and 66% as indicatd by aki_48hr and aki_7day, which is much higher then those mentioned in above papers. This result gave me the impression that the current code might over diagnose AKI a lot due to wrongly calculated urine output sums. I am thinking about changing the code to use sums of more precisely 6hr (or 6h30min for some fuzzy), but this will definitely drop many redords and result in false negtive diagnosis. What's your opnion on this matter? Thanks.

[1] Chen, K. P., Cavender, S., Lee, J., Feng, M., Mark, R. G., Celi, L. A., … Danziger, J. (2016). Peripheral edema, central venous pressure, and risk of AKI in critical illness. Clinical Journal of the American Society of Nephrology, 11(4), 602–608. https://doi.org/10.2215/CJN.08080715 [2] Danziger, J., Chen, K. P., Lee, J., Feng, M., Mark, R. G., Celi, L. A., & Mukamal, K. J. (2016). Obesity, Acute Kidney Injury, and Mortality in Critical Illness. Critical Care Medicine, 44(2), 328–334. https://doi.org/10.1097/CCM.0000000000001398 [3] Ghosh, S., Feng, M., Nguyen, H., & Li, J. (2016). Hypotension Risk Prediction via Sequential Contrast Patterns of ICU Blood Pressure. IEEE Journal of Biomedical and Health Informatics, 20(5), 1416–1426. https://doi.org/10.1109/JBHI.2015.2453478

alistairewj commented 5 years ago

Really glad you're digging in to this as it's pretty hard to infer AKI from atypically documented UO.

I looked at this and I think the real issue is I did the aggregation in the wrong direction. The query is looking ahead 6 hours and aggregating, but really it should look backward 6 hours and aggregate. This would result in (1) the "time" of AKI being more realistic (i.e. we can only diagnose them with AKI if we have observed 6 hours of oliguria) and (2) the issue you pointed out would be flipped; instead of underestimating UO when atypical documentation is present, we would overestimate UO (because we would collect UO documented for 5 hours, and the 5th hour would actually cover >1 hour of UO). Consequently we would underestimate AKI, but I would prefer to go conservative (happy to hear your thoughts here). I'll implement this quickly then release a version so we can start to track changes better.

This does not solve the issue of documentation happening every 12 hours, and thus we never flag the patient as having oliguria because the query doesn't try to estimate the duration over which the UO was calculated. I could see solving this by calculating a time since the last UO, and maybe adding in information about the type of catheterization of the patient (foley, etc), but I probably won't have time to do this myself.

In terms of the rate of AKI, I definitely agree that it's too high. That being said I wouldn't trust any citation unless you can see the code used to generate the concept - call me a skeptic ;)

alistairewj commented 5 years ago

See above and also https://github.com/MIT-LCP/mimic-code/commit/cbfdd96c50dcdc01a450f34ee455ddac531a1a7a

That certainly made a difference:

with t1 as
(
    SELECT ic.icustay_id
, ic.gender
, ic.admission_age AS age
, w.weight
, h.height
, los_icu
, first_hosp_stay
, aki_48hr
, aki_7day
FROM icustay_detail ic
LEFT JOIN weightfirstday w USING(icustay_id)
LEFT JOIN heightfirstday h USING(icustay_id)
LEFT JOIN kdigo_stages_48hr k4 USING(icustay_id)
LEFT JOIN kdigo_stages_7day k7 USING(icustay_id)
)
, t2 as
(
select aki_48hr
, count(*) as n
from t1
where age > 18
and first_hosp_stay
and los_icu > 1
group by aki_48hr
)
select aki_48hr, n, round(cast(n AS NUMERIC)*100/SUM(n) OVER (), 1) as frac
from t2
order by 1;
aki_48hr n frac
0 19974 57.2
1 14867 42.6
(NULL) 54 0.2

But, if you look across categories, seems there's still more work to do - ~21% of the AKI cases are from the UO alone:

aki_48hr aki_stage_48hr_creat aki_stage_48hr_uo n frac
0 0 0 4532 13.0
0 0 1 2448 7.0
0 0 2 5112 14.6
0 0 3 736 2.1
0 0 (NULL) 7137 20.5
0 (NULL) 0 9 0.0
1 1 0 1981 5.7
1 1 1 1382 4.0
1 1 2 3489 10.0
1 1 3 1516 4.3
1 1 (NULL) 3620 10.4
1 2 0 374 1.1
1 2 1 185 0.5
1 2 2 467 1.3
1 2 3 238 0.7
1 2 466 1.3
1 3 0 171 0.5
1 3 1 72 0.2
1 3 2 161 0.5
1 3 3 376 1.1
1 3 (NULL) 357 1.0
1 (NULL) 2 7 0.0
1 (NULL) 3 5 0.0
(NULL) (NULL) (NULL) 54 0.2

SQL for the above:

with t1 as
(
SELECT ic.icustay_id
, ic.gender
, ic.admission_age AS age
, w.weight
, h.height
, los_icu
, first_hosp_stay
, aki_48hr
, AKI_stage_48hr_creat
, AKI_stage_48hr_uo
, aki_7day
FROM icustay_detail ic
LEFT JOIN weightfirstday w USING(icustay_id)
LEFT JOIN heightfirstday h USING(icustay_id)
LEFT JOIN kdigo_stages_48hr k4 USING(icustay_id)
LEFT JOIN kdigo_stages_7day k7 USING(icustay_id)
)
, t2 as
(
select aki_48hr, AKI_stage_48hr_creat, AKI_stage_48hr_uo
, count(*) as n
from t1
where age > 18
and first_hosp_stay
and los_icu > 1
group by aki_48hr, AKI_stage_48hr_creat, AKI_stage_48hr_uo
)
select aki_48hr, AKI_stage_48hr_creat, AKI_stage_48hr_uo, n
, round(cast(n AS NUMERIC)*100/SUM(n) OVER (), 1) as frac
from t2
order by 1, 2, 3;
JackieMium commented 5 years ago

Thanks for the quick solution. I've checked the results again after a git pull of revised code. Now the time points will be when AKI diagnose is made and for abnormal records of time points, we're overestimating UO now, which means AKI will be underestimated. The issue I raised is only partly solved, yes. But still, if we're suming UO backward, should the ending time point should be _first_ charttime + 5hr interval for UO_6hr ? otherwise the first few UO sums after ICU admission will always be underestimated because there won't be a 6hr interval backward. The same goes for UO_12hr (sum should starts from _first_ charttime + 11hr interval) and UO_24hr (_first_ charttime + 23hr interval).

alistairewj commented 5 years ago

Yes I agree the issue is only partly solved - ideally we'd have each UO measurement and a duration of measurement associated with it but implementing "what time period does this UO correspond to?" requires some thought as there are many ways to do it wrong!

But still, if we're suming UO backward, should the ending time point should be first charttime + 5hr interval for UO_6hr ? otherwise the first few UO sums after ICU admission will always be underestimated because there won't be a 6hr interval backward. The same goes for UO_12hr (sum should starts from first charttime + 11hr interval) and UO_24hr (first charttime + 23hr interval).

Yes, I handled this in the staging queries: https://github.com/MIT-LCP/mimic-code/commit/01ee15aac33e70f21eeeac9feacb106b025f4661

Not ideal to not include it in kdigo-uo.sql - but adding some kind of "ignore observations < 6 hours" felt awkward in the kdigo-uo.sql query and more naturally fit into the staging queries. Happy for suggested alternatives/PRs of course.

JackieMium commented 5 years ago

@alistairewj Oh yes I saw it in https://github.com/MIT-LCP/mimic-code/commit/01ee15aac33e70f21eeeac9feacb106b025f4661, but I think it's better to do this in kdigo-uo.sql. I've been thinking about how to do this and didn't get anything yet, sorry. I am quite new to Postgres and not good at writing complex queries. Thanks for your patience and your time. I'll let you know if I come up with something, though it might not be likely.

alistairewj commented 5 years ago

No worries. If you come up with logic that was solid in whatever programming language you prefer I could write it in SQL - but no pressure.

JackieMium commented 5 years ago

@alistairewj
So how about this: for every given record of urine output in urineoutput we divede patitent's weight and the time interval(charttime - last charttime) the record is for and flag all < 0.5 ml/(kg.hr). Then we only consider continous records with the <0.5 flags and cumulate the time intervals. When there is a continous <0.5 event for longer than 6hr, we flag it as AKI. Code is here:

https://gist.github.com/JackieMium/18a980b37bcf3eff251d83fa4d74862f

With this, I get:

 aki_05_6hr | count  
------------+------- 
          0 | 22303  
          1 |  9119  

which is about 29% ( 9119/(9119+22303) ) of all included cases.

I know neither the code nor the logic behind it is perfect, because we only consider continous time intervals, which seems to underestimate AKI a lot. But I decided to share it here just in case it might shed some light on this tricky issue for you or anybody who sees it. Let me know what 's on your mind.

alistairewj commented 5 years ago

Yeah it's a viable approach. I think it's worth investigating cases where they are only charting UO every 12-24 hours as I'm unsure how well it would work in those situations.

Kshine000 commented 2 years ago

@alistairewj So how about this: for every given record of urine output in urineoutput we divede patitent's weight and the time interval(charttime - last charttime) the record is for and flag all < 0.5 ml/(kg.hr). Then we only consider continous records with the <0.5 flags and cumulate the time intervals. When there is a continous <0.5 event for longer than 6hr, we flag it as AKI. Code is here:

https://gist.github.com/JackieMium/18a980b37bcf3eff251d83fa4d74862f

With this, I get:

 aki_05_6hr | count  
------------+------- 
          0 | 22303  
          1 |  9119  

which is about 29% ( 9119/(9119+22303) ) of all included cases.

I know neither the code nor the logic behind it is perfect, because we only consider continous time intervals, which seems to underestimate AKI a lot. But I decided to share it here just in case it might shed some light on this tricky issue for you or anybody who sees it. Let me know what 's on your mind.

Thank you, that is great. Could u please write more code to distinguish each stage of AKI?

JackieMium commented 2 years ago

@Kshine000 It's the same logic for all, just change the time intervals and threshold values according to AKI staging thresholds, code may be redundently long though.

Kshine000 commented 2 years ago

@JackieMium Yeah, but I'm a cardiac surgeon, and writing code is really hard for me. You're really excellent. Could you please do that for our team? Thank you.