Open DanielBrkr opened 2 months ago
Hi,
Thanks for the question. It is difficult to say exactly what is going on without the full code replicating this issue. My guess would be that you are plotting only the first time of suspected infection, and there is possibly a later one which is closer to the SOFA increase that triggers the Sepsis-3 label.
If the SI time you indicated is the only SI time for this individual, then there is something unusual happening. If you share the full code, I am happy to take a look.
Sure no problem, here's the corresponding R code from my python code. I hope it's not too unidiomatic, that's basically my first time using the R language at all.
library(ricu)
library(units)
library(ggplot2)
library(dplyr)
ricu::import_src("eicu")
ricu::attach_src("eicu")
ricu::src_data_avail()
sofa_data <- ricu::load_concepts("sofa",
"eicu",
keep_components = TRUE,
interval = mins(15L)
)
si_data <- ricu::load_concepts("susp_inf", "eicu",
abx_min_count = 2L,
positive_cultures = TRUE,
si_mode = "or",
keep_components = TRUE,
interval = mins(15L)
)
sepsis_data <- sep3(sofa_data, si_data, si_window = "any", si_lwr = hours(48L),
si_upr = hours(24L), keep_components = TRUE,
interval = mins(15L)
)
id_sample <- 3166218
si_sample <- si_data %>% filter(patientunitstayid == id_sample)
sofa_sample <- sofa_data %>% filter(patientunitstayid == id_sample)
sepsis_sample <- sepsis_data %>% filter(patientunitstayid == id_sample)
si_sample$susp_inf <- as.numeric(si_sample$susp_inf)
sepsis_sample$sep3 <- as.numeric(sepsis_sample$sep3)
ggplot() +
geom_point(data = sofa_sample, aes(x = labresultoffset, y = sofa), color = "blue", size = 2, shape = 16, alpha = 0.6) +
geom_point(data = si_sample, aes(x = infusionoffset, y = susp_inf), color = "black", size = 4, shape = 15) +
geom_point(data = sepsis_sample, aes(x = labresultoffset, y = sep3), color = "magenta", size = 6, shape = 17) +
labs(x = "Time (hours)", y = "Values") +
scale_x_continuous(labels = function(x) paste0(x / 60, "h")) + # offsets are provided as minutes afaik
theme_minimal()
There's in fact, only one data point related to the suspected infection, at least for this patient id, that's the dataframe:
patientunitstayid | infusionoffset | abx_time | samp_time | susp_inf | |
---|---|---|---|---|---|
1 | 3166218 | -225 | -225 | NA | 1,00000 |
Is there anything else you need?
Thanks for raising this issue. There is indeed a bug in ricu
. However, note that if you work with hourly intervals, the issue does not appear (patient 3166218 does not have a sepsis event). This may be a preferred solution for now.
For concreteness (and discussion with @nbenn) here is a reproducible example with a reasonable amount of RAM:
si <- load_concepts("susp_inf", "eicu",
abx_min_count = 2L,
positive_cultures = TRUE,
si_mode = "or",
keep_components = TRUE,
interval = mins(15L)
)
pids <- c(unique(id_col(si))[1:200], 3166218)
sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
interval = mins(15L),
patient_ids = pids)
sep3(sofa[patientunitstayid == 3166218], si[patientunitstayid == 3166218],
si_window = "any", keep_components = TRUE,
interval = mins(15L))
The issue arises from L162-165 in callback-sep3.R
. Here, a difference get(index_var(susp)) - si_lwr
is taken, and since index_var is in minutes, and si_lwr is in hours, the difference is cast to seconds. Then in the non-equi join in L173-175 the comparison join_time1 >= si_lwr
makes no sense because the units are different.
It seems that converting si_lwr
and si_upr
to minutes resolves the issue (since a subtraction of quantities with equal units does not result in casting to seconds).
Thanks a lot for the clarification and the suggested workaround!
I gave it a try just now and noticed something else in a patient with a valid sepsis label according to the sep3 function with an hourly interval.
It seem like there's a rounding error when comparing the labresultoffset
and abx_time
from the hourly interval result, with the result of the workaround (for minutes), at least from my intuition as I would expect the label to be "forward filled" instead of "back filled". Maybe related to chopping in an unsafe cast somewhere else? Or is this intended behaviour?
patientunitstayid | delta_sofa | labresultoffset | abx_time | samp_time | sep3 |
---|---|---|---|---|---|
141436 | 3 | 45 mins | 585 mins | NA mins | TRUE |
patientunitstayid | delta_sofa | labresultoffset | abx_time | samp_time | sep3 |
---|---|---|---|---|---|
141436 | 2 | 0 hours | 9 hours | NA hours | TRUE |
Here's the code I adapted from you to reproduce it:
si <- load_concepts("susp_inf", "eicu",
abx_min_count = 2L,
positive_cultures = TRUE,
si_mode = "or",
keep_components = TRUE,
interval = mins(15L))
pids <- c(unique(id_col(si))[1:200], 141436)
sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
interval = mins(15L),
patient_ids = pids)
sep3(sofa[patientunitstayid == 141436], si[patientunitstayid == 141436],
si_window = "any",
si_lwr = mins(2880L), # 60 * 48 = 2880
si_upr = mins(1440L), # 60 * 24 = 1440
keep_components = TRUE,
interval = mins(15L)
)
si <- load_concepts("susp_inf", "eicu",
abx_min_count = 2L,
positive_cultures = TRUE,
si_mode = "or",
keep_components = TRUE,
interval = hours(1L))
pids <- c(unique(id_col(si))[1:200], 141436)
sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
interval = hours(1L),
patient_ids = pids)
sep3(sofa[patientunitstayid == 141436], si[patientunitstayid == 141436],
si_window = "any", keep_components = TRUE,
interval = hours(1L))
Thanks again!
Hi there,
and first of all, thanks for the great work, I just noticed some oddities with regard to Sepsis 3 labels
generated by
for the eICU dataset.
There are some assigned labels for which the SI-window seems to be violated. e.g. 144 hours between the last SI-event and the determining SOFA score increase, considering the Sepsis-3 requirements in the documentation.
Is this an expected artefact due the way the labels are generated for eICU under the hood or is there something else off that might need to be taken care of?
See the plot down below for an example:
Thanks again!