DoubleML / doubleml-for-r

DoubleML - Double Machine Learning in R
https://docs.doubleml.org
Other
121 stars 27 forks source link

different learners for different treatments in Simultaneous Inference #85

Open hadigilan opened 3 years ago

hadigilan commented 3 years ago

Hi, I have an idea to develop the package for simultaneous inference.

When the nature of the treatments are different (continuous or binary) it is not possible to run the function DoubleMLPLR, for example. Because there is only one choice for the argument ml_m to estimate related nuisance function. To more elaborate, consider two treatments d1 and d2 which are continuous and binary, respectively. To estimate the nuisance function in the case of causal inference for d1 we must apply a machine learning method for family gaussian. While to for causal inference in the case of d2 we must apply a machine learning method for logistic regression. Thus, users must define a continuous version of d2 or convert d1 to a binary treatment to have a same-nature treatments.

However, in some cases, the program automatically detect the nature of the treatments (for example regr.gbm learner from the package gbm).

If the argument ml_m can be of type list as length as d_cols, we can run DoubleMLPLR for different-nature treatment situation.

Thanks for your hot pkg!

PhilippBach commented 3 years ago

Hi @hadigilan,

thanks a lot for your interest in the package and for your suggestions. We discussed your idea and we agree that it would be great to have more flexibility with regard to different learners/target variables. We add this point to the list of features that we will implement in the future. Because the feature is related to handling the learners, we have to take care for potential side effects and other dependencies -> it will probably take a little while until we support this feature.

However, there are 2 potential work arounds that might help you already

1) You can use Regression Learners for the binary output which might already help a bit, like regr.gbm, i.e., with option distribution = "gaussian". 2) You can separately set up DoubleMLPLR objects for the continuous (d1) and the binary (d2) treatment variables and estimate the causal effects separately. Then you can manually merge the scores as if they would be obtained in the multiple treatment case and manually run the bootstrap code .

1) is probably much easier and quicker but neglets the binary nature of the treatment variable. 2) will probably be closer to what you want to do but involves more implementation effort. In case you go for 2) and do some proper implementation rather than a quick work around, you can open a PR and we can integrate it in the package.

Once more, thank you very much.

Best,

Philipp