Open juandavidgutier opened 1 week ago
Hi @juandavidgutier, Thanks for the feedback. In theory the confidence intervals should be very straightforward to calculate via randomization inference like we do with the p-values. However, because it takes a long time to do this procedure (there is a matrix inversion every time a new model is estimated under a permutation of the treatment vector), this takes a really long time. So I think it would have to be disabled by default and enabled by setting the inference argument to true, as is the case with p-values. But I would also be interested if you have any faster alternative ways of computing p-values and confidence intervals in a nonparametric generic way that would work for all the models. For example, to get p-values (and confidence intervals) we would do this.
g_computer = GComputation(x, t, y)
estimate_causal_effect!(g_computer)
summarize(g_computer, inference=true)
I think this would definitely be feasible for the next release, though.
As for marginal effects, I know this is very straightforward for something like a logistic regression, but I'm not exactly sure how you would do it with multiple models, e.g. when using double machine learning or a metalearner. Do you have any references for this? I'm definitely open to it and I think it would be good to calculate marginal effects in the summarize method.
Hi @dscolby,
Unfortunately, I am not an expert in programming in Julia, but an option to modify the method to estimate confidence intervals could be to follow the documentation of the Python package EconML. In this case, (and as I understand it) the procedure could be as follows:
For Confidence Intervals (details at: https://econml.azurewebsites.net/_modules/econml/inference/_bootstrap.html#BootstrapEstimator)
For Marginal Effect, you could see the following documentation of EconML: Marginal effect: https://econml.azurewebsites.net/_autosummary/econml._cate_estimator.html?highlight=marginal%20effect#econml._cate_estimator.BaseCateEstimator.marginal_effect Constant marginal effect: https://econml.azurewebsites.net/_autosummary/econml.dml.DML.html?highlight=const_marginal_effect#econml.dml.DML.const_marginal_effect
@juandavidgutier Bootstrapping runs into the same performance issues as randomization/permutation inference. I considered bootstrapping inference but ultimately went with randomization inference because it answers a slightly different question than bootstrapping. Bootstrapping is telling us what the probability of seeing an effect at least as extreme as the estimated effect is from some theoretical (normal) distribution. But randomization inference is telling us the proportion of times we would see an effect at least as extreme as the estimated effect under different treatment assignment mechanisms. Either way, I think I'll need to work on getting it parallelized, which is what EconML does. But getting the p-value is definitely feasible, so I'll work on that as I have time.
For the marginal effect, I was originally thinking about taking derivatives with estimators like R-learners that have multiple models, which would be tough, especially since each estimator is different. But it seems like other packages that use simpler estimators just use the finite difference approximation, which should also be pretty straightforward to implement. So, Il'' also get to work on this for the next release, but it will probably be slow going because I have a lot on my plate right now.
But again, thanks for the suggestions and references.
Hi @dscolby you are right, both methods of randomization inference and bootstrapping have the same performance issues. Thanks for listening to the suggestion.
Hi @dscolby I work on implementing causal learning in eco-epidemiology, and I recently discovered CausalELM. I see important features in the package, such as the computation of G-computation and the E-value. However, it would be amazing if you could add the marginal effect or the constant marginal effect, along with its confidence interval.