Open gravesti opened 3 years ago
It looks like the expanded dataset doesn't get filtered for eligibility. So many trials are included where the patient is ineligible are included.
II wonder if this line should be elgcount==1
? Because it seems like expand
is always 1.
https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245
If I make this change, I get the same regression results as SAS.
Or could this line only expand the eligible==1 periods? I'm not really sure how this works with data.table. It would probably be a bit faster to avoid doing all the expanding work if we can. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L200
It looks like the expanded dataset doesn't get filtered for eligibility. So many trials are included where the patient is ineligible are included.
II wonder if this line should be
elgcount==1
? Because it seems likeexpand
is always 1. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245If I make this change, I get the same regression results as SAS.
Or could this line only expand the eligible==1 periods? I'm not really sure how this works with data.table. It would probably be a bit faster to avoid doing all the expanding work if we can. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L200
Hi Isaac, I will have a look. I don't think the problem is any of these but I will have a look now and let you know. Thank you so much for letting me know.
I edited something and I think the problem should be fixed now.
I don't think that's the right fix. It seems like this only keeps the records which have eligibility==1 in sw_data, but then we have lost all the follow-up rows from periods which weren't eligible to start a trial.
I'm pretty sure the change has to be in expand()
.
It seems like
https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L177 creates the right expand indicator but that gets overwritten at
https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L225
So the filtering at the end only applies to the second expand
variable.
https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245
I think we need to filter on the first expand variable before it is overwritten.
I've made some test data.
# 10 patients observed for periods 0 - 10. Only eligible in the first period.
# I expect that the expanded dataset only contains the trial starting in period==0
# i.e. 10 patients * 1 eligible trial * 11 periods = 110 rows
dummy_data <- expand.grid(t = 0:10, id = 1:10)
dummy_data$treatment <- ifelse(dummy_data$id < 5, 1, 0)
dummy_data$eligible <- ifelse(dummy_data$t == 0,1,0)
dummy_data$outcome <- ifelse(1 < dummy_data$id & dummy_data$id <= 6 & dummy_data$t==10, 1, 0)
initiators(data_path = my_csv,
id = "id",
period = "t",
treatment = "treatment",
outcome = "outcome",
eligible = "eligible",
model_var = "assigned_treatment",
data_dir ="./",
numCores = 1)
I either get 660 rows before your change and 10 rows after
I don't think that's the right fix. It seems like this only keeps the records which have eligibility==1 in sw_data, but then we have lost all the follow-up rows from periods which weren't eligible to start a trial.
I'm pretty sure the change has to be in
expand()
. It seems like https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L177creates the right expand indicator but that gets overwritten at https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L225
So the filtering at the end only applies to the second
expand
variable. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245I think we need to filter on the first expand variable before it is overwritten.
I updated the code and tested it with the dummy data you sent and it seems like now it works. However, I got an error from parglm because there are fewer observations than covariates. So, I am looking at it.
Excellent. Thanks @RoonakR! This has fixed the data issue and I get the same data as SAS.
I updated the code and tested it with the dummy data you sent and it seems like now it works. However, I got an error from parglm because there are fewer observations than covariates. So, I am looking at it.
It was a small dataset, so it's possible that it can't be solved.
Using the same parameters for the ITT analysis in SAS and in R, I noticed I got different results. I have identified that the dataset that R generated has more records than SAS. It seems like the datasets share many rows and that R has some in addition. I'll look further into the differences and try to find the cause in the code.