DoubleML / doubleml-for-r

DoubleML - Double Machine Learning in R
https://docs.doubleml.org
Other
126 stars 25 forks source link

Minor inconsistency between user guide notation and the code? #100

Closed brgordon1 closed 3 years ago

brgordon1 commented 3 years ago

I have a question about a potential inconsistency between the notation provided in the user guide and the code. If not an inconsistency, then it must represent my own misunderstanding of the notation in the user guide (and if so, my apologies in advance).

Looking at the documentation to estimate the variance of the estimator, I would describe the expression as J_{0}^{-2} multiplied by the mean of \psi^2, where this latter term is represented by the double sum over folds and observations within each fold. The N^{-1} here serves to calculate the mean over this double sum.

However, in the code here, the quantity above is premultiplied by an additional N^{-1} term.

I suspect the code is correct, and so that's why this seems more like an issue about the notation in the documentation. I looked at Theorem 3.2 in the published paper but I had trouble identifying where the extra N^{-1} term would come from.

Is this a notation problem or am I missing something?

Thanks, Brett

MalteKurz commented 3 years ago

Hi @brgordon1,

thanks for your remark. Let me try to clarify this: In the field obj_dml$se you find the asymptotic standard error, i.e., formula4. It is the standard error from the asymptotic normal distribution scaled with the square-root of the number of observations. We know that image with image If you now for example want to construct a confidence interval for your parameter estimate, you get the following expression image So the asymptotic variance of your parameter estimate is formula4 and the standard error that we return in obj_dml$se is the square-root of it. In the referenced code line, we compute the asymptotic variance (i.e., before taking the square-root but already with the additional scaling with N): https://github.com/DoubleML/doubleml-for-r/blob/d2f108e857745b4c95c92d63cb2d800f5f2731ea/R/double_ml.R#L1231

I hope this helps you. We btw also mention this for example here https://docs.doubleml.org/stable/guide/se_confint.html (see screenshot). image

However, I fully agree that this could be made clearer at some places in the documentation, e.g., the API doc here: https://docs.doubleml.org/r/stable/reference/DoubleML.html#public-fields.

Thanks, Malte

brgordon1 commented 3 years ago

Thank you, @MalteKurz. Very helpful and I appreciate the quick reply. I should really have caught that myself by looking at the expression for the confidence interval. I'm glad it now makes sense.

Regards, Brett

MalteKurz commented 3 years ago

No worries, thanks for the feedback :+1:. I think your point was valid and we could make the documentation clearer in this regard.