colabobio / Lassa_randomized_data

0 stars 0 forks source link

Calculate number of days a pre-admission symptom is present #1

Closed codeanticode closed 3 years ago

codeanticode commented 3 years ago

In the current version of the binary analysis script, 1 is assigned when the symptom variable is not empty, 0 otherwise:

https://github.com/colabobio/Lassa_randomized_data/blob/main/binary_variable_analysis.Rmd#L44

However, this does not differentiate between the situation when the symptom was present only a one of a few days versus a for many days. This could be done easily by counting how many different words Dx are in the string.

KelseyB-code commented 3 years ago

I think this would also change all subsequent code because it would be analyzed as # of days instead of as a binary variable. Should I make this code into a new file?

codeanticode commented 3 years ago

Yes, because I think we have to different pieces of information here, one is whether the patient had the symptom by the time of admission, and the other for how long. The former is what we did in the previous analysis so we should keep the binary variables for comparison. But now we can do a more detailed study and look at how much the knowledge of the duration of symptoms helps making better predictions.

KelseyB-code commented 3 years ago

I addressed this issue in the new file "parse_duration_variables.Rmd"

codeanticode commented 3 years ago

We should calculate the significance of the days symptom is present between patient groups (surv/vied) outside ggplot, to make sure it's what we think it is.

KelseyB-code commented 3 years ago

Added this code in the most recent version of parse_duration_variables. I calculated p values and saved the output to a csv. I'm finding it tricky to add p values to the graph, but will try again in the future.