eth-mds / ricu

🏥 ICU data with R 🏥
https://eth-mds.github.io/ricu/
GNU General Public License v3.0
33 stars 11 forks source link

Odd Sepsis Labels with eICU #66

Open DanielBrkr opened 2 months ago

DanielBrkr commented 2 months ago

Hi there,

and first of all, thanks for the great work, I just noticed some oddities with regard to Sepsis 3 labels

generated by

sepsis_data <- sep3(sofa_data, si_data, si_window = "any", si_lwr = hours(48L),
                    si_upr = hours(24L), keep_components = TRUE,
                    interval = mins(15L)
)

for the eICU dataset.

There are some assigned labels for which the SI-window seems to be violated. e.g. 144 hours between the last SI-event and the determining SOFA score increase, considering the Sepsis-3 requirements in the documentation.

Is this an expected artefact due the way the labels are generated for eICU under the hood or is there something else off that might need to be taken care of?

See the plot down below for an example:

odd_sepsis_label

Thanks again!

dplecko commented 1 month ago

Hi,

Thanks for the question. It is difficult to say exactly what is going on without the full code replicating this issue. My guess would be that you are plotting only the first time of suspected infection, and there is possibly a later one which is closer to the SOFA increase that triggers the Sepsis-3 label.

If the SI time you indicated is the only SI time for this individual, then there is something unusual happening. If you share the full code, I am happy to take a look.

DanielBrkr commented 1 month ago

Sure no problem, here's the corresponding R code from my python code. I hope it's not too unidiomatic, that's basically my first time using the R language at all.

library(ricu)  
library(units)  
library(ggplot2)  
library(dplyr)  

ricu::import_src("eicu")  
ricu::attach_src("eicu")  
ricu::src_data_avail()  

sofa_data <- ricu::load_concepts("sofa",  
                                 "eicu",  
                                 keep_components = TRUE,  
                                 interval = mins(15L)  
)  

si_data <- ricu::load_concepts("susp_inf", "eicu",  
                               abx_min_count = 2L,  
                               positive_cultures = TRUE,  
                               si_mode = "or",  
                               keep_components = TRUE,  
                               interval = mins(15L)  
)  

sepsis_data <- sep3(sofa_data, si_data, si_window = "any", si_lwr = hours(48L),  
                    si_upr = hours(24L), keep_components = TRUE,  
                    interval = mins(15L)  
)  

id_sample <- 3166218  

si_sample <- si_data %>% filter(patientunitstayid == id_sample)  
sofa_sample <- sofa_data %>% filter(patientunitstayid == id_sample)  
sepsis_sample <- sepsis_data %>% filter(patientunitstayid == id_sample)  

si_sample$susp_inf <- as.numeric(si_sample$susp_inf)  
sepsis_sample$sep3 <- as.numeric(sepsis_sample$sep3)  

ggplot() +  
  geom_point(data = sofa_sample, aes(x = labresultoffset, y = sofa), color = "blue", size = 2, shape = 16, alpha = 0.6) +  
  geom_point(data = si_sample, aes(x = infusionoffset, y = susp_inf), color = "black", size = 4, shape = 15) +  
  geom_point(data = sepsis_sample, aes(x = labresultoffset, y = sep3), color = "magenta", size = 6, shape = 17) +  
  labs(x = "Time (hours)", y = "Values") +  
  scale_x_continuous(labels = function(x) paste0(x / 60, "h")) + # offsets are provided as minutes afaik  
  theme_minimal()

There's in fact, only one data point related to the suspected infection, at least for this patient id, that's the dataframe:

patientunitstayid infusionoffset abx_time samp_time susp_inf
1 3166218 -225 -225 NA 1,00000

Is there anything else you need?

dplecko commented 1 month ago

Thanks for raising this issue. There is indeed a bug in ricu. However, note that if you work with hourly intervals, the issue does not appear (patient 3166218 does not have a sepsis event). This may be a preferred solution for now.

For concreteness (and discussion with @nbenn) here is a reproducible example with a reasonable amount of RAM:

si <- load_concepts("susp_inf", "eicu",  
                    abx_min_count = 2L,  
                    positive_cultures = TRUE,  
                    si_mode = "or",  
                    keep_components = TRUE,
                    interval = mins(15L)
                    )
pids <- c(unique(id_col(si))[1:200], 3166218)
sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
                      interval = mins(15L),
                      patient_ids = pids)

sep3(sofa[patientunitstayid == 3166218], si[patientunitstayid == 3166218], 
     si_window = "any", keep_components = TRUE,
     interval = mins(15L))

The issue arises from L162-165 in callback-sep3.R. Here, a difference get(index_var(susp)) - si_lwr is taken, and since index_var is in minutes, and si_lwr is in hours, the difference is cast to seconds. Then in the non-equi join in L173-175 the comparison join_time1 >= si_lwr makes no sense because the units are different.

It seems that converting si_lwr and si_upr to minutes resolves the issue (since a subtraction of quantities with equal units does not result in casting to seconds).

DanielBrkr commented 2 weeks ago

Thanks a lot for the clarification and the suggested workaround!

I gave it a try just now and noticed something else in a patient with a valid sepsis label according to the sep3 function with an hourly interval.

It seem like there's a rounding error when comparing the labresultoffset and abx_time from the hourly interval result, with the result of the workaround (for minutes), at least from my intuition as I would expect the label to be "forward filled" instead of "back filled". Maybe related to chopping in an unsafe cast somewhere else? Or is this intended behaviour?

Results with the 15 min workaround

patientunitstayid delta_sofa labresultoffset abx_time samp_time sep3
141436 3 45 mins 585 mins NA mins TRUE

Result with hourly interval

patientunitstayid delta_sofa labresultoffset abx_time samp_time sep3
141436 2 0 hours 9 hours NA hours TRUE

Here's the code I adapted from you to reproduce it:

15 Min Interval Workaround

si <- load_concepts("susp_inf", "eicu",
                     abx_min_count = 2L,
                     positive_cultures = TRUE,
                     si_mode = "or",
                     keep_components = TRUE,
                     interval = mins(15L))

pids <- c(unique(id_col(si))[1:200], 141436)

sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
                       interval = mins(15L),
                       patient_ids = pids)

sep3(sofa[patientunitstayid == 141436], si[patientunitstayid == 141436],
     si_window = "any",
     si_lwr = mins(2880L), # 60 * 48 = 2880
     si_upr = mins(1440L), # 60 * 24 = 1440
     keep_components = TRUE,
     interval = mins(15L)
)

Hourly Interval

si <- load_concepts("susp_inf", "eicu",
                     abx_min_count = 2L,
                     positive_cultures = TRUE,
                     si_mode = "or",
                     keep_components = TRUE,
                     interval = hours(1L))

pids <- c(unique(id_col(si))[1:200], 141436)

sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
                       interval = hours(1L),
                       patient_ids = pids)

sep3(sofa[patientunitstayid == 141436], si[patientunitstayid == 141436],
      si_window = "any", keep_components = TRUE,
      interval = hours(1L))

Thanks again!