RobinDenz1 / simDAG

An R-Package to Simulate Simple and Complex (longitudinal) Data from a DAG and Associated Node Information
https://robindenz1.github.io/simDAG/
GNU General Public License v3.0
8 stars 0 forks source link

Time_since_last evaluates improperly in an fifelse or an ifelse #2

Closed zterner-mitre closed 8 months ago

zterner-mitre commented 8 months ago

simDAG - Git - waning immunity file.txt

Hi Robin,

In the attached file, I have built a simulation using simDAG that is intended to have waning vaccine protection over the course of the 90 days that the vaccine is in effect. All of the functions work well except for this one, prob_covid_f, which uses data$vaccination_time_since_last outside of the condition statement of an fifelse. (All the other times data$vaccination_time_since_last is used are within the condition statement of an fifelse.) I am copying a screenshot of that function to the bottom of the post for your reference. (I tried copying/pasting code, but the comments in the code caused the code to format poorly.)

There are different errors which occur based on the different ways I have tried to make this work. In the uncommented iteration of the function (seen below), the resulting output gives NA for all covid status when data$vaccination_time_since_last < vacc_duration. Even though the fifelse should always return a probability value, (since na is always specified), it is returning NAs whenever data$vaccination_time_since_last < vacc_duration. You can see this in the output from this code:

sim.dat %>% filter(.id == 5) %>% filter(.time %in% c(40:135))

I have also tried indexing data$vaccination_time_since_last by sim_time (and by adding sim_time as an argument to the function definition) but that has also yielded results which appear buggy.

These results appear to indicate that there is a bug in how data$vaccination_time_since_last is evaluated, and although I have tried a few different workarounds, none have proven to be correct and effective. I apologize if this is not a bug but is just a simple error, though I doubt that to be the case since I have experimented with this in numerous different ways.

In addition, (though not as important as the above issue), I have found that R's traditional if{ } else { } or ifelse{ } framework does not work as well as fifelse with this package, since if { } else { } seems to pull the whole vector data$vacination_time_since_last whereas fifelse appears to not pull the whole vector if it data$vaccination_time_since_last is called in the condition statement of the fifelse. I imagine you are aware that ifelse{ } does not work as well as fifelse{ }, but thought I may bring it to your attention just in case.

Thank you sincerely for your time and consideration.

image
RobinDenz1 commented 8 months ago

First of all I want to say that I admire that you haven't given up on this package yet!

However, this is not a bug in the package itself, it is merely a small bug in your prob_covid_f() function that is probably the result of a small misunderstanding. When calculating the probability of getting covid at time t, you have to use something like fifelse() or ifelse() because the probability is calculated in a vectorized fashion for all individuals simultaneously. The standard if () statement can only handle one value, not vectors and as such is not suited for this task.

If you put something like:

if (is.na(p[5])) {
    print(p[5])
}

on line 37 of your code you will see that prob_covid_f() evaluates to NA for this person at some points in time, which then leads to missing values in the covid_event variable. This is because you use the max() function, which returns one value regardless of input size, when you really want to use pmax() which will return one maximum per person.

Hope this helps!

zterner-mitre commented 8 months ago

Thanks, Robin! I always forget about the existence of pmax and pmin.

By line 37, do you mean before the return(p) statement on line 38? Or do you mean within the main fifelse that starts on line 19?

Also, is there a way to index like I tried to do in the commented out portion using seq.vax[sim_time]? That does not seem to be working but I imagine there should be or is a clean way to do it. Although the code currently works, I can imagine cases where I may want to index in the way of seq.vax[sim_time].

Many thanks!

RobinDenz1 commented 8 months ago

I meant before the return statement, so you could see the NAs being printed.

In your present example you wouldn't want to index by sim_time, but by time_since_last (+ 1). Can't think of completely clean code for that from the top of my hat, though.

I am curious though, what are you planning to use your simulation for?

zterner-mitre commented 8 months ago

Thanks, Robin! I'm trying to simulate some data so I can test out some causal inference capabilities. So using a DAG to generate data seemed like a good way to have a simulation where I can control the knobs and make sure I am getting the right effects.

In general, it may be helpful to simulate using an indexed vector like I tried in the comment. (For example, if I wanted to do a nonlinear vaccine effectiveness decay, it may be nice to specify that as a vector rather than as an argument within the fifelse function, though I suppose they are equivalent. Or I could write the function within the prob_covid_f function itself and then call the function I write within prob_covid_f.)

RobinDenz1 commented 8 months ago

That is exactly what this package is designed for. Please let me know if you ever publish something related to this, would be very interesting to me!