Differences in dataset for ITT analysis

gravesti commented 3 years ago

Using the same parameters for the ITT analysis in SAS and in R, I noticed I got different results. I have identified that the dataset that R generated has more records than SAS. It seems like the datasets share many rows and that R has some in addition. I'll look further into the differences and try to find the cause in the code.

gravesti commented 3 years ago

It looks like the expanded dataset doesn't get filtered for eligibility. So many trials are included where the patient is ineligible are included.

II wonder if this line should be elgcount==1? Because it seems like expand is always 1. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245

If I make this change, I get the same regression results as SAS.

Or could this line only expand the eligible==1 periods? I'm not really sure how this works with data.table. It would probably be a bit faster to avoid doing all the expanding work if we can. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L200

RoonakR commented 3 years ago

It looks like the expanded dataset doesn't get filtered for eligibility. So many trials are included where the patient is ineligible are included.

II wonder if this line should be elgcount==1? Because it seems like expand is always 1. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245

If I make this change, I get the same regression results as SAS.

Or could this line only expand the eligible==1 periods? I'm not really sure how this works with data.table. It would probably be a bit faster to avoid doing all the expanding work if we can. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L200

Hi Isaac, I will have a look. I don't think the problem is any of these but I will have a look now and let you know. Thank you so much for letting me know.

RoonakR commented 3 years ago

I edited something and I think the problem should be fixed now.

gravesti commented 3 years ago

I don't think that's the right fix. It seems like this only keeps the records which have eligibility==1 in sw_data, but then we have lost all the follow-up rows from periods which weren't eligible to start a trial.

I'm pretty sure the change has to be in expand(). It seems like https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L177 creates the right expand indicator but that gets overwritten at https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L225 So the filtering at the end only applies to the second expand variable. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245 I think we need to filter on the first expand variable before it is overwritten.

gravesti commented 3 years ago

I've made some test data.

# 10 patients observed for periods 0 - 10. Only eligible in the first period.
# I expect that the expanded dataset only contains the trial starting in period==0
# i.e. 10 patients * 1 eligible trial * 11 periods = 110 rows

dummy_data <- expand.grid(t = 0:10, id = 1:10) 
dummy_data$treatment <- ifelse(dummy_data$id < 5, 1, 0)
dummy_data$eligible <- ifelse(dummy_data$t == 0,1,0)
dummy_data$outcome <- ifelse(1 < dummy_data$id  & dummy_data$id <= 6 & dummy_data$t==10, 1, 0)

initiators(data_path = my_csv,
                   id = "id",
                   period = "t",
                   treatment = "treatment",
                   outcome = "outcome",
                   eligible = "eligible",
                   model_var = "assigned_treatment",
                   data_dir ="./",
                   numCores = 1)

I either get 660 rows before your change and 10 rows after

RoonakR commented 3 years ago

I don't think that's the right fix. It seems like this only keeps the records which have eligibility==1 in sw_data, but then we have lost all the follow-up rows from periods which weren't eligible to start a trial.

I'm pretty sure the change has to be in expand(). It seems like https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L177

creates the right expand indicator but that gets overwritten at https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L225

So the filtering at the end only applies to the second expand variable. https://github.com/RoonakR/RandomisedTrialsEmulation/blob/5b603ed8d6c73edbe7769ee42ed9d9def86e2241/R/lr_utils.R#L245

I think we need to filter on the first expand variable before it is overwritten.

I updated the code and tested it with the dummy data you sent and it seems like now it works. However, I got an error from parglm because there are fewer observations than covariates. So, I am looking at it.

gravesti commented 3 years ago

Excellent. Thanks @RoonakR! This has fixed the data issue and I get the same data as SAS.

I updated the code and tested it with the dummy data you sent and it seems like now it works. However, I got an error from parglm because there are fewer observations than covariates. So, I am looking at it.

It was a small dataset, so it's possible that it can't be solved.

RoonakR / RandomisedTrialsEmulation

Differences in dataset for ITT analysis #2