Open gravesti opened 1 year ago
@gravesti I have tried to simulate some data to mimic the scenario that patients' data are compatible with multiple dynamic treatment regimes. For example, the columns compatible
, compatible1
below represent whether patients data are consistent with two regimes based on two time-varying covariates X1 (binary), X2(continuous),
Note we focus on a single trial here.
However, we need to calculate weights for the observed treatments up to the last visit that the regime is adhered (i.e. data are compatible). This is not the same as we do in the package because we only calculate for the always treated vs always controlled regimes. So we will need to alter the weight calculation when applying clone-censor-weight because of the difference in how data are coded for treatments and a particular regime. I will work out an example using some generic code in R.
@lisu-stats I guess from this data set up, we would just need to stack the data, once with compatible
and then with compatible1
, and modify the IDs for the 2nd copy. Then I think it could work directly in our existing functions.
Maybe we would need to change the IDs back for the sandwich estimator, so the clones have the same ID?
@gravesti In this simulation setting, the true weights should be the IPTW weights. Simply fitting a IPCW model to compatible
and compatible1
may have the risk of model misspecification. Note that in these simulated data, there was more variation in the treatment process. See the comment from Gaber et al's review paper
I will discuss this issue with Shaun.
@gravesti Shaun and I agreed that we probably want to implement both IPTW and IPCW for the clone-censor-weight approach because they can be useful to different settings.
IPCW is straightforward: as you mentioned, we just calculate the weights separately based on created censoring indicators such as compatible
and compatible1
. Then weights are applied to the cloned data in an MSM. Then we use the fitted MSM to predict survival curves. You are quite right, clones should have the same IDs as they come from the same patients.
For IPTW, we need to use the original treatment data before cloning and artificial censoring with parametric treatment models to estimate the treatment probability. For each uncensored observation in cloned data, we use these fitted treatment models to estimate the probability of being uncensored up to visit t
according to the observed treatment sequence up to t
that is compatible with the treatment regimen.
Because of the inference problems for weights estimated by data-adaptive methods as mentioned above, let's focus on parametric models for weight estimation. Adapting undersmoothed HAL to longitudinal settings will be a big step and we will see if there is new work on this coming out.
@lisu-stats Thanks for the explanation. I think I will need to talk through the IPTW next week.
I have also see this pre-print manuscript https://arxiv.org/pdf/2404.15073 where they suggest to limit the patients who are included in the IPCW calculation to avoid an "impossible intervention". I'd be pleased to hear what you think about this.
@gravesti I agree it is tricky when the treatment timing is incorporated in a regimen. For such regimens, I feel like alternative approach such as longitudinal modified treatment policy might be better https://muse.jhu.edu/article/883479
@gravesti In Appendix 4, page 18 of https://doi.org/10.2202/1557-4679.1212
the authors explained that there might be multiple versions of the regimes corresponding to initiate treatment within m
months after the recorded CD4 cell count first drops below x
. They used IPTW and specify some numerator of the weight to clarify the regime of interest. Clearly using IPCW will have the same problem of multiple versions.
[ ] add reference
[ ] See https://github.com/lpetito/SEERMedicareCEAnalysis/blob/master/10_perprotocol.R
[ ] Design document
[ ] Input data format: id, outcome, time, compliant_1, compliant_2, censored, confounders
[ ] weighting model specification
[ ] Fit outcome model.