Open antranttu opened 3 years ago
@antranttu:
From reading the paper, my impression is as follows: for cases, 2
represents recovered from sepsis. Regarding control cases, it helps to understand the procedure by which the controls were generated. First, 10 control cases are matched to 1 sepsis case. Then the onset time of sepsis for the sepsis case, say 12 hours after admission to the ICU, is used as the onset of the "pseudo_target"
for the control cases. Conceptually, a "pseudo_target"
value of 1
represents the time at which the control case would have had sepsis if he had developed sepsis at all. That sounds weird, but I think it is necessary to prevent the classifier to use e.g., the difference between the end of the ICU stay for a control case vs. the beginning of the ICU stay for a sepsis case.
In other words, if you are training a classifier that uses the last 8 hours before sepsis onset to predict sepsis, which 8 hours do you use for control cases that never developed sepsis? That is what you use the pseudo_target label for.
All of this is just my understanding of the paper and the code, I welcome any corrections.
Hello,
First of all, thank you very much for this repo! It has been extremely helpful and practical to open-source something like this. I do have a couple of questions where I couldn't find explanations to from the paper and hopefully they can be addressed here.
I was able to extract the sepsis cohort and the necessary files from running the scripts. From the
cases_55h_hourly_vitals
table, I can see that for sepsis patients (cases), theirsepsis_target
starts from0
, then becomes1
after onset was identified, then eventually becomes2
. What does2
represent in this case? Lastly, for patients without sepsis (controls), they have a column namedpseudo_target
which also consists of0, 1, 2
. What does this column represent for control group and why do they also have the same labels as the cases group?Please shed some light! Thank you very much!